Reforge has joined Miro ↗
All courses
AI
AI Evals
AI Evals
Gain a practical playbook for defining, measuring, and improving the quality of AI agents
Gain a practical playbook for defining, measuring, and improving the quality of AI agents
By
Justin Bauer and Sandhya Hegde



Course outcomes
"Writing evals is going to become a core skill for product managers. It is such a critical part of making a good product with AI." — Kevin Weil, CPO@OpenAI
Traditional software was designed to guide users through a single "happy path," but AI products behave differently. Each interaction produces a wide distribution of personalized outcomes for every user. This means the PM craft is fundamentally changing — all the way from writing PRDs to analytics. The AI PM must adapt to this new reality to ship high quality AI products that customers love.
This course is a practical introduction to AI Evaluation for product managers. We will walk through the product management process for AI agents and give you a step-by-step playbook for building an eval-driven roadmap, including AI PRDs, golden datasets, eval rubrics, trace analysis, and automated evaluators. We will use examples, case studies, demos, and DIY exercises to help demonstrate what this process looks like in real life. By the end of the course, our goal is to help you grow into a full stack AI PM who can lead the development of high-quality AI products.
Course outcomes
"Writing evals is going to become a core skill for product managers. It is such a critical part of making a good product with AI." — Kevin Weil, CPO@OpenAI
Traditional software was designed to guide users through a single "happy path," but AI products behave differently. Each interaction produces a wide distribution of personalized outcomes for every user. This means the PM craft is fundamentally changing — all the way from writing PRDs to analytics. The AI PM must adapt to this new reality to ship high quality AI products that customers love.
This course is a practical introduction to AI Evaluation for product managers. We will walk through the product management process for AI agents and give you a step-by-step playbook for building an eval-driven roadmap, including AI PRDs, golden datasets, eval rubrics, trace analysis, and automated evaluators. We will use examples, case studies, demos, and DIY exercises to help demonstrate what this process looks like in real life. By the end of the course, our goal is to help you grow into a full stack AI PM who can lead the development of high-quality AI products.
Who this course is for
PMs and Senior PMs who are building (or want to learn how) AI features or agents and want a practical way to reason about quality, not just ship demos
Directors, Heads, and VPs of Product who are responsible for setting standards for how their teams build, evaluate, and improve AI products
Product leaders designing AI Product Reviews who feel their current toolkit is out of date and want to develop modern, AI-native practices
If you’re responsible for deciding what “good” looks like for an AI product (or expect to be in the next 6-12 months) then this course is for you.
Who this course is for
PMs and Senior PMs who are building (or want to learn how) AI features or agents and want a practical way to reason about quality, not just ship demos
Directors, Heads, and VPs of Product who are responsible for setting standards for how their teams build, evaluate, and improve AI products
Product leaders designing AI Product Reviews who feel their current toolkit is out of date and want to develop modern, AI-native practices
If you’re responsible for deciding what “good” looks like for an AI product (or expect to be in the next 6-12 months) then this course is for you.
Who this course is not for
As this is designed as an introductory course, it is not intended for those with advanced technical knowledge (e.g., ML engineers)
Who this course is not for
As this is designed as an introductory course, it is not intended for those with advanced technical knowledge (e.g., ML engineers)
Course syllabus
Module
0
The Evaluation Gap
Module
1
AI PRDs & Eval-driven Roadmaps
Module
2
Eval Lifecycle and Introduction to Trace Analysis
Module
3
Automated Evaluation - Code-based & LLM-as-Judge
Module
4
Evaluating Agent Architectures
Course syllabus
Module
0
The Evaluation Gap
Module
1
AI PRDs & Eval-driven Roadmaps
Module
2
Eval Lifecycle and Introduction to Trace Analysis
Module
3
Automated Evaluation - Code-based & LLM-as-Judge
Module
4
Evaluating Agent Architectures
Course created by
AI Evals
Gain a practical playbook for defining, measuring, and improving the quality of AI agents
Get Reforge Learning
40+ on-demand courses
600+ practical how-to guides
1,400+ ready-to-use artifacts
Reforge extension with AI-powered expert advice
This course is not available on-demand.
AI Evals
Gain a practical playbook for defining, measuring, and improving the quality of AI agents
Get Reforge Learning
40+ on-demand courses
600+ practical how-to guides
1,400+ ready-to-use artifacts
Reforge extension with AI-powered expert advice
This course is not available on-demand.













