January 14, 2026
.@tavus just published a nice blog post about their "real-time conversation flow and floor transfer" model, Sparrow-1.
This model does turn detection, predicting when it's the Tavus video agent's turn to speak. It does this by analyzing conversation audio in a continuous stream and learning and adapting to user behavior.
This model is an impressive achievement. I've had a few opportunities to talk to @code_brian, who led the R&D on this model at Tavus, about his work. I love Brian's approach to this problem. Among other things, the Sparrow-1 architecture allows this model to do things like handle overlapping speech, and predict when someone is going to stop talking before they actually do.
It's worth reading the Sparrow-1 blog post and watching Brian's explainer video if you're interested in conversational AI tech.
Right now you can only use this model as part of the Tavus full stack. (It's not available separately as weights or via an API.) I recorded some video just before Christmas of the Tavus Santa Clause avatar, which used the Sparrow-1 model.
I never got around to posting that video clip. I had an idea to write up something about the "Santa Clause Avatar Benchmark," tracking the year-over-year improvement in interactive AI Santa demos. But I'll leave imagining that tongue-in-cheek post as an exercise to the reader and just put the video here as an example of an AI agent that uses the Sparrow-1 model for turn detection!
Sparrow-1 technical blog post[1]