March 25, 2025
My ask for everybody who publishes benchmarks for APIs and models: measure both throughput *and* latency.
Time to first token/byte is critical for voice AI use cases. Much more important than tokens per second. https://t.co/6aOR9eROZT
Great interview with @kwindla from Daily and PipeCat with insights on what apps need in the real world from LLM and voice models like TFT (time first token)
