Voice agent cost calculator (and a couple of common calculation mistakes)

April 20, 2025

Here's a spreadsheet to calculate the per-minute cost of a voice agent.

I cleaned this up to share because I've had the same surprising conversation a few times lately ... 🧵

tldr: In the most common voice agent configurations I see today, the biggest cost is voice generation. Hosting the agent is less than 1% of the per-minute cost.

Several times recently, people have asked me "Do I have to run Silero VAD, it uses way too much CPU?"

This surprises me because Silero is actually really efficient for what it does (realtime classification of audio frames as speech or non-speech). Silero uses about 1/8th of a typical cloud vCPU.

When we dug in further, it turned out that in each case the people asking about Silero were trying to run more concurrent voice agents on each cloud VM. When I asked why, the answer was, "to bring down cost" ...

But if you do the math, hosting cost is less than 1% of the overall cost of running a voice agent in the cloud.

Here's a cost calculator spreadsheet you can copy and tinker with:

https://t.co/N1zrzYXsWr

The numbers in the sheet are for: Deepgram for transcription (STT), OpenAI GPT-4o (LLM), and Cartesia for voice output (TTS).

The total cost per minute of this configuration is between two and three cents per second. The cost breakdown is approximately:
- STT: 28%
- LLM: 23%
- TTS: 48%
- Hosting: <1%

Note that these numbers are list prices, prior to volume discount commits or enterprise pricing.

One other tricky thing about calculating voice agent cost is that the LLM cost per minute is higher for long conversations than short ones. (LLM cost compounds, because you need to feed the previous conversation history to the LLM for every turn.)

For a 30-minute conversation, the breakdown is more like:
- STT: 19%
- LLM: 47%
- TTS: 33%
- Hosting: <1%

For long conversations you may want to compress or summarize the conversation history on the fly. This helps save cost, keep latency down, and help the LLM follow instructions and call functions reliably.

More info here: https://t.co/edIHllqAVr