
Philip Kiely
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
How to Engineer AI Inference Systems with Philip Kiely
- Published
- April 30, 2026
- Duration
- 54:51
- Summary source
- description
- Last updated
- May 3, 2026
Discusses inference.
Summary
In this episode, Philip Kiely, head of AI education at Baseten, joins us to unpack the fast-evolving discipline of inference engineering. We explore why inference has become the stickiest and most critical workload in AI, how it blends GPU programming, applied research, and large-scale distributed systems, and where the line sits between inference and mod…
Show notes
In this episode, Philip Kiely, head of AI education at Baseten, joins us to unpack the fast-evolving discipline of inference engineering. We explore why inference has become the stickiest and most critical workload in AI, how it blends GPU programming, applied research, and large-scale distributed systems, and where the line sits between inference and model serving. Philip shares how research-to-production can move in hours, not months, and why understanding “the knobs” of inference—batching, qu
Themes
- inference