← kwindla hultman kramer

A voice agent powered by gpt-oss

August 6, 2025

A voice agent powered by gpt-oss. Running locally on my macBook. Demo recorded in a Waymo with WiFi turned off.

I'm still on my space game voice AI kick, obviously. Code link below.

For conversational voice AI, you want to set the gpt-oss reasoning behavior to "low". (The default is "medium".) Notes on how to do that and a jinja template you can use are in the repo.

The LLM in the demo video is the big, 120B version of gpt-oss. You can use the smaller, 20B model for this, of course. But OpenAI really did a cool thing here designing the 120B model to run in "just" 80GB of VRAM. And the llama.cpp mlx inference is fast: ~250ms TTFT.

Running a big model on-device feels like a time warp into the future of AI.

Code[1]

  1. https://github.com/kwindla/gpt-oss-space-game/