Production deployment
Latency, cost, caching, scaling, and shipping LLM features safely.
10·Free resources
0 of 10 resources completed
Log in to track progressLog in to mark resources complete and sync progress across devices.
- Docs25 min
OpenAI - Production best practices
Caching, streaming, and monitoring recommendations.
Open resource - Article18 min
Latency kills UX - Dan Luu
Why milliseconds matter for interactive AI features.
Open resource - Docs40 min
Kubernetes basics for ML services
Pods, deployments, and autoscaling for inference.
Open resource - Docs35 min
NVIDIA Triton inference server
Batching and model ensembles for GPU throughput.
Open resource - Docs30 min
OpenTelemetry - Observability primer
Traces and metrics for LLM microservices.
Open resource - Article22 min
Weights & Biases - LLM monitoring
Logging prompts, outputs, and eval scores.
Open resource - Docs28 min
PostgreSQL pgvector extension
Co-locate vectors with transactional data for RAG apps.
Open resource