Cover art for The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Philip Kiely

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

How to Engineer AI Inference Systems with Philip Kiely

Published
April 30, 2026
Duration
54:51
Summary source
description
Last updated
May 3, 2026

Discusses inference.

Summary

In this episode, Philip Kiely, head of AI education at Baseten, joins us to unpack the fast-evolving discipline of inference engineering. We explore why inference has become the stickiest and most critical workload in AI, how it blends GPU programming, applied research, and large-scale distributed systems, and where the line sits between inference and mod…

Show notes

In this episode, Philip Kiely, head of AI education at Baseten, joins us to unpack the fast-evolving discipline of inference engineering. We explore why inference has become the stickiest and most critical workload in AI, how it blends GPU programming, applied research, and large-scale distributed systems, and where the line sits between inference and model serving. Philip shares how research-to-production can move in hours, not months, and why understanding “the knobs” of inference—batching, qu

Themes

  • inference