December 11, 2024
Big day today for conversational AI!
A new Gemini 2.0 model and a new voice-to-voice (plus video input) API from @Google.
🔊📹🤖⚡️🔊😀
See the thread below for links to:
➡️ Open Source @pipecat_ai clients for Web, React, Android, iOS, and C++. Echo cancellation and noise reduction, hooks for function calling and tool use, support for both WebSocket and WebRTC network transport.
➡️ A Pipecat service that brings the Multimodal Live API features into the Pipecat Open Source ecosystem. Use this model in combination with your existing voice agent workflows, for example.
➡️ Bite-sized sample code demos.
➡️ A full-blown multimodal chat app starter kit project.
Gemini 2.0 launched today. Amazing multimodal capabilities, long context windows, fast response times, built-in tools, and top-of-the-leaderboards reasoning capabilities.
Plus a new API — the Multimodal Live API — for conversational AI applications, like voice agents and