Video thumbnail for How to Find the Agent Failures Your Evals Miss with Scott Clark — The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Scott Clark

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

How to Find the Agent Failures Your Evals Miss with Scott Clark

Published
May 7, 2026
Duration
47:05
Summary source
description
Last updated
May 10, 2026

Discusses llm, evals.

Summary

In this episode, Scott Clark, co-founder and CEO of Distributional, joins us to explore how teams can reliably operate and improve complex LLM systems and agents in production. Scott introduces a Maslow’s hierarchy of observability: telemetry for logging, monitoring for known signals, and post-production or online analytics to surface unknown unknowns. We…

Show notes

In this episode, Scott Clark, co-founder and CEO of Distributional, joins us to explore how teams can reliably operate and improve complex LLM systems and agents in production. Scott introduces a Maslow’s hierarchy of observability: telemetry for logging, monitoring for known signals, and post-production or online analytics to surface unknown unknowns. We dig into examples of real-world failures Scott’s team has seen in production systems, such as “lazy” tool-use hallucinations that standard eva

Themes

  • llm
  • evals