Swapping between LLMs during a long-running voice AI conversation

October 14, 2024

Swapping between LLMs during a long-running voice AI conversation.

I had been wanting to build this for a while, but put it off partly because I kept hoping I'd run across a library that would make it easy. (I've asked a couple of times in various threads/chats/forums, but nothing seemed like a perfect fit for plugging into @pipecat_ai. Maybe somebody in the comments will point me to the right thing!)

I sort of needed to update some internals of Pipecat's context manager classes to make the new OpenAI Realtime API implementation as clean as possible.

So ... here's a start on saving/loading conversation state between OpenAI / OpenAI Realtime / Llama / Anthropic.

Tomorrow I'll add image messages, Gemini support, and fix a few corner cases I didn't get to today.

The code is here, in the PR for OpenAI Realtime API support:

https://t.co/4hRtAAAron

As an aside, it continues to blow my mind that SOTA LLMs can string together several functional calls to accomplish a task. In this case "load the most recent conversation" requires calling a function to get a list available conversations, deciding which one is the most recent, and then calling another function to load that one.

If you're interested in hacking on this kind of thing, join us for the Open Source Voice and Video AI Hackathon this weekend. We're doing both an in-person track in San Francisco, and a remote track on Discord. $20k in (cash) prizes up for grabs and some great sponsors.

https://t.co/5EYQfFfmAy

https://github.com/pipecat-ai/pipecat/pull/541 ↩