← kwindla hultman kramer

Big day today for conversational AI!

December 11, 2024

Big day today for conversational AI!

A new Gemini 2.0 model and a new voice-to-voice (plus video input) API from @Google.

🔊📹🤖⚡️🔊😀

See the thread below for links to:

➡️ Open Source @pipecat_ai clients for Web, React, Android, iOS, and C++. Echo cancellation and noise reduction, hooks for function calling and tool use, support for both WebSocket and WebRTC network transport.
➡️ A Pipecat service that brings the Multimodal Live API features into the Pipecat Open Source ecosystem. Use this model in combination with your existing voice agent workflows, for example.
➡️ Bite-sized sample code demos.
➡️ A full-blown multimodal chat app starter kit project.

Daily@trydaily

Gemini 2.0 launched today. Amazing multimodal capabilities, long context windows, fast response times, built-in tools, and top-of-the-leaderboards reasoning capabilities.

Plus a new API — the Multimodal Live API — for conversational AI applications, like voice agents and