Evals & safety

Benchmarks, red-teaming, guardrails, and monitoring.

10·Free resources

0 of 10 resources completed

Log in to track progress

Log in to mark resources complete and sync progress across devices.

  • Article35 min

    Anthropic - Red teaming language models

    Structured adversarial testing before release.

    Open resource
  • Docs30 min

    OpenAI - Evals repository

    YAML-driven benchmarks you can extend for your product.

    Open resource
  • Article25 min

    Stanford HELM - Holistic evaluation

    Multi-metric leaderboard methodology.

    Open resource
  • Docs40 min

    NIST AI RMF Playbook

    Governance mapping for AI systems in enterprises.

    Open resource
  • Article30 min

    OWASP LLM Top 10

    Threat categories from prompt injection to insecure plugins.

    Open resource
  • Docs22 min

    Google - Responsible AI practices

    Fairness, privacy, and safety checkpoints.

    Open resource
  • Article15 min

    LMSYS Chatbot Arena

    Crowdsourced Elo rankings for model comparison intuition.

    Open resource
  • Article45 min

    DeepMind - Constitutional AI paper

    RL from AI feedback for harm reduction.

    Open resource
  • Docs20 min

    Microsoft - Azure OpenAI content filters

    Configurable moderation tiers for APIs.

    Open resource
  • Docs28 min

    Anthropic - Evaluating Claude

    Structured evals, red-team loops, and regression checks.

    Open resource

← All learning paths