January 7, 2026
New release of the @pipecat_ai Smart Turn model today. (Plus a funny LLM outtake in the demo video ...)
This is a point release (version 3.2) with some nice quantitative improvements for short speech segments and noisy environments.
Good turn detection is important for voice agents. Smart Turn is an open source, native audio turn detection model that you can drop into any voice agent to give you very fast, accurate turn detection.
In Pipecat pipelines, we generally run Smart Turn in parallel with transcription. This parallelization gives you the fastest possible end-to-end latency.
If you're using Pipecat, you'll get the new 3.2 weights automatically when you upgrade to the next (upcoming) release of Pipecat. You can also download the weights from @huggingface to use the model today, or to use it with your own framework.
I dropped Smart Turn 3.2 into the open source @NVIDIAAI voice agent we posted yesterday. This agent is running on my DGX spark and has ~600ms voice-to-voice response latency! I tried to show how the model works by highlighting the Turn.INCOMPLETE and Turn.COMPLETE model decisions. You can run this code yourself and look at the debug log lines, of course.
Having an intuitive sense for how all the AI models in an agent work together is valuable for AI engineers. This simple agent has a really basic prompt, so the LLM will sometimes offer to do things it can't actually do. It did that when I was recording the demo, so I left that in as an interesting blooper. :-)
Here's the Smart Turn v3 announcement post: https://t.co/QLHZfwbNpD
Model training code, weights, and inference code: https://t.co/YbiYc7Y8VT
Getting started with Smart Turn in Pipecat voice agents: https://t.co/l15qvN12CP
Here's the code for the NVIDIA open source voice agent in the demo: https://t.co/pVmjGotIDh