← kwindla hultman kramer

[ Hoisting this out of another thread here on X

April 11, 2025

[ Hoisting this out of another thread here on X ... ]

I'm having a lot of conversations lately about voice agent components and, relatedly, cost. Lots of people are new to voice AI and are exploring the options!

You can build voice agents using a "full stack" platform. @Vapi_AI is a really good one. Or you can build voice agents yourself, mixing and matching options for agent code, network transport, hosting, and (possibly) client SDKs.

- Code framework. Vapi bundles the "orchestration" into their full stack. @pipecat_ai is open source and vendor neutral - you can use Pipecat with lots of different network transports. (I work on Pipecat.) @livekit Agents is a nice framework that's open source and not vendor neutral - it's tightly coupled to LiveKit's WebRTC/SIP stack.

- Network transport. This is how data moves between client applications and servers. Pipecat is commonly used with generic WebSockets, Twilio Media Streams WebSockets, peer-to-peer WebRTC, cloud WebRTC (Daily, LiveKit), cloud SIP. If you're using somebody's cloud infrastructure (Twilio, Daily, LiveKit) you'll pay them for routing the network traffic for you. If you're operating at scale you probably want to use somebody's global, scalable network transport infrastructure rather than build your own. If you're not operating at scale, free options include p2p WebRTC, LiveKit's single-node open source WebRTC server, and FastAPI/WebSockets (for hobby projects or server-to-server applications).

- Agent hosting. Vapi bundles this into their full stack. If you're taking the mix-and-match approach, you can host the agent code yourself on whatever infrastructure platform you like best. If you're building with Pipecat, you can set up to `docker push` to @trydaily's Pipecat Cloud if you want to, and pay $0.01 per bot minute for hosting.

- Client libraries. If you're building telephone voice agents you have to do everything server-side. But for web and native mobile apps, there's a lot of benefit to using voice AI client-side SDKs like the Pipecat open source, vendor-neutral SDKs. These SDKs support all the network transports listed above, plus direct connections to OpenAI Realtime API and the Gemini Multimodal Live API. (So you can use the Pipecat client SDKs without using Pipecat's server-side agent library, if you want, and without hosting any bot code at all, anywhere.)

Here's a story telling app built with the Pipecat client SDK for React and Gemini 2.0 Flash.

Vapi docs[1]

Pipecat getting started[2]

Pipecat Cloud hosting[3]

Pipecat open source, cross-platform SDKs for Javascript, React, iOS, Android, and C++[4]

OpenAI Pipecat starter kit:…

  1. https://docs.vapi.ai/welcome
  2. https://docs.pipecat.ai/getting-started/overview
  3. https://docs.pipecat.daily.co/introduction
  4. https://docs.pipecat.ai/client/introduction