Pipecat 0.0.99 is a pretty big release!

January 15, 2026

Pipecat 0.0.99 is a pretty big release! 25 items in the "Added" section, including vision (image input) support for OpenAI Realtime, word-level timestamps in AzureTTSService, the @krisp_ai VIVA turn detection model, and Grok Realtime voice-to-voice.

There's also a fundamental new abstraction in this release: turn and interruption "strategies."

We started working on Pipecat in 2023. (!) In those early days, we had just a few STT, TTS, and LLM models we could use for voice agents. The only turn detection option was Silero VAD. We were building fairly simple pipelines and targeting fairly simple use cases.

All of that has changed. There are more than 90 services now in Pipecat core. Speech-to-text (transcription/ASR) models increasingly do much more than transcription, including turn detection, and with widely differing configuration options and API events. You can build Pipecat pipelines with speech-to-speech models, with STT->LLM->TTS cascades, or even using both in the same agent.

0.0.99 introduces a new way to configure and develop the "user turn start," "user turn stop," and "user mute" code in your pipelines.

As always in Pipecat, the goals are: to make things things work consistently no matter what services you're using in your pipeline, to provide standard components that do things most people want to do, and for it to be easy to extend these standard components to do things that are unique to your application.

Try out 0.0.99's turn strategies and let us know what you think of these new building blocks.

Changelog^[1]

Pipecat quickstart^[2]