Production deployment

Latency, cost, caching, scaling, and shipping LLM features safely.

10·Free resources

0 of 10 resources completed

Log in to track progress

Log in to mark resources complete and sync progress across devices.

  • Docs25 min

    OpenAI - Production best practices

    Caching, streaming, and monitoring recommendations.

    Open resource
  • Docs20 min

    Anthropic - Prompt caching

    Reduce cost for long system prompts and docs.

    Open resource
  • Article18 min

    Latency kills UX - Dan Luu

    Why milliseconds matter for interactive AI features.

    Open resource
  • Docs40 min

    Kubernetes basics for ML services

    Pods, deployments, and autoscaling for inference.

    Open resource
  • Docs35 min

    NVIDIA Triton inference server

    Batching and model ensembles for GPU throughput.

    Open resource
  • Docs30 min

    OpenTelemetry - Observability primer

    Traces and metrics for LLM microservices.

    Open resource
  • Article22 min

    Weights & Biases - LLM monitoring

    Logging prompts, outputs, and eval scores.

    Open resource
  • Article20 min

    FinOps for AI - cost allocation

    Tagging spend by team, feature, and model.

    Open resource
  • Docs25 min

    Cloudflare - Edge caching for APIs

    When you can cache model responses safely.

    Open resource
  • Docs28 min

    PostgreSQL pgvector extension

    Co-locate vectors with transactional data for RAG apps.

    Open resource

← All learning paths