← kwindla hultman kramer

Spending valentine's day exactly as you'd expect

February 14, 2026

Spending valentine's day exactly as you'd expect. (Arguing, politely, on LinkedIn about how to accurately measure latency and word error rates.) https://t.co/iULMyNZtNE

kwindla@kwindla

Wake up, babe. New Pareto frontier chart just dropped.

Benchmarking STT for voice agents: we just published one of the internal benchmarks we use to measure latency and real-world performance of transcription models.

- Median, P95, and P99 "time to final transcript" numbers for hosted STT APIs.
- A standardized "Semantic Word Error Rate" metric that measures transcription accuracy in the context of a voice agent pipeline.
- We worked with all the model providers to optimize the configurations and @pipecat_ai implementations so that the benchmark is as fair and representative as we can possibly make it.

Entirely open source. You can run the benchmark yourself and reproduce the results.

Image from @kwindla's postImage from @kwindla's postImage from @kwindla's postImage from @kwindla's post