LLMOps & AI Infrastructure
Deployment, monitoring, cost optimization, and scaling LLM applications in production.
12·Free resources
0 of 12 resources completed
Log in to track progressLog in to mark resources complete and sync progress across devices.
- Docs35 min
Google Cloud - MLOps on Vertex AI
Pipelines, monitoring, and model registry for LLM apps.
Open resource - Docs32 min
AWS - SageMaker MLOps best practices
CI/CD for models and multi-account governance.
Open resource - Article22 min
Arize - LLM observability
Tracing prompts, latency, and drift in production.
Open resource - Docs24 min
WhyLabs - Data and LLM monitoring
Statistical profiles for inputs and outputs.
Open resource - Article20 min
Honeycomb - OpenTelemetry for AI services
Correlating traces across model and app tiers.
Open resource - Docs26 min
Datadog - LLM monitoring product docs
Dashboards for tokens, errors, and spend.
Open resource - Docs28 min
LangSmith - Evaluation & deployment
Datasets, evaluators, and regression tests for prompts.
Open resource - Article24 min
Weights & Biases - LLMOps platform
Experiment tracking meets production observability.
Open resource - Docs30 min
Kubernetes - Autoscaling workloads
Scale inference pods with GPU-aware schedulers.
Open resource - Article18 min
FinOps Foundation - Unit economics
Chargeback, forecasting, and cost guardrails for AI.
Open resource - Docs22 min
OpenTelemetry - GenAI semantic conventions
Standard spans for model calls across vendors.
Open resource