Here's a complete Gemini Multimodal Live + iOS + WebRTC starter kit

January 10, 2025

Here's a complete Gemini Multimodal Live + iOS + WebRTC starter kit.

Filipi added an iOS example to the @pipecat_ai "Simple Chatbot" repo.

With the Pipecat iOS SDK, you can build apps that use Gemini Multimodal Live and Gemini Flash with three different network transports:

- WebSockets
- WebRTC
- HTTP

Filipi also recorded two videos that show the iOS sample app in action, together with Gemini Multimodal Live's multi-language abilities.

Seamless multi-language understanding/speech ➕ super low-latency (natural) conversation is 🤯.

Here's live translation both ways between English and Portuguese.

And here's switching between English and Portuguese during a voice conversation with Gemini.

The Simple Chatbot iOS example code is here:

https://t.co/pYzXKgP4au

Clone the repo ➡️ add your API keys to the .env file ➡️ build ➡️ run on your phone!

This example code is set up to use WebRTC for network transport, but you can easily switch over to using the Multimodal Live WebSocket API, or using Gemini Flash with HTTP.

In general, for server-to-server use cases you'll want to use WebSockets. (And WebSockets are also great for personal projects and experiments.)

For a production iOS app, you'll want to use WebRTC.

The @pipecat_ai client SDKs are open source, cross-platform SDKs that provide lots of scaffolding for building multimodal, conversational AI applications. They make it easy to "bridge" a WebRTC connection from an end-user device through to a WebSocket connection to the Gemini Multimodal Live API.

Here's the code in the Simple Chatbot example repo that sets up the relay to the Multimodal Live API:

https://t.co/zE4mFm2ADz

If you're interested in learning about the technical differences between WebSockets and WebRTC for realtime audio (and video) see this thread:

https://t.co/VbnZgDjDNZ

https://github.com/pipecat-ai/pipecat/tree/main/examples/simple-chatbot/examples/ios ↩