Episodes (5)

April 20, 2026

GLM-5.1: The Open-Weight Model Challenging GPT-5.4

We break down how Z.ai’s GLM-5.1 landed a 58.4 on SWE-Bench Pro and edged past leading proprietary models on a major coding benchmark. Then we dig into why its MIT license, open weights, and tool-use focus could reshape the business of AI coding assistants.

April 19, 2026

TurboQuant and the Hidden KV Cache Bottleneck

Andy breaks down why LLM demos can fail in production even when the model fits on the GPU: the real pressure often comes from the KV cache during long prompts and high concurrency. He also explains Google Research’s TurboQuant approach, how 3-bit cache compression could slash memory use and infrastructure costs, and what to test before trying it in a self-hosted stack.

April 18, 2026

GPT-5.4 Can Use Your Desktop Now

We break down OpenAI’s GPT-5.4 and its native computer-use abilities, from screenshot-driven clicks and typing to why the 75% OSWorld score matters for real office automation. The episode also covers developer controls, finance and ops use cases, pricing, and the guardrails you’ll need before putting it into production.

April 18, 2026

Anthropic’s Mythos and the New Era of Autonomous Exploits

Anthropic’s restricted release of its most powerful model to top defenders raises a huge question: is this a security breakthrough or the start of a new offensive AI arms race? We dig into Mythos’ reported ability to independently find, reproduce, and exploit a 17-year-old FreeBSD flaw, and what that means for patching, disclosure, and enterprise defense.

April 18, 2026

OpenAI’s Hiro Deal: A Week to Save Your Finance Data

OpenAI’s acquisition of Hiro Finance comes with a rapid shutdown, permanent data deletion, and a seven-day window for users to export their information. The episode explores why Hiro’s verified financial math mattered, what Ethan Bloch’s team brings to OpenAI, and how this deal could signal a bigger push into domain-specific AI for personal finance.