My ask for everybody who publishes benchmarks for APIs and models: measure both…

March 25, 2025

My ask for everybody who publishes benchmarks for APIs and models: measure both throughput *and* latency.

Time to first token/byte is critical for voice AI use cases. Much more important than tokens per second. https://t.co/6aOR9eROZT

Laurent Denoue@ldenoue

Great interview with @kwindla from Daily and PipeCat with insights on what apps need in the real world from LLM and voice models like TFT (time first token)