August 6, 2025
A voice agent powered by gpt-oss. Running locally on my macBook. Demo recorded in a Waymo with WiFi turned off.
I'm still on my space game voice AI kick, obviously. Code link below.
For conversational voice AI, you want to set the gpt-oss reasoning behavior to "low". (The default is "medium".) Notes on how to do that and a jinja template you can use are in the repo.
The LLM in the demo video is the big, 120B version of gpt-oss. You can use the smaller, 20B model for this, of course. But OpenAI really did a cool thing here designing the 120B model to run in "just" 80GB of VRAM. And the llama.cpp mlx inference is fast: ~250ms TTFT.
Running a big model on-device feels like a time warp into the future of AI.
Code[1]