July 11, 2025
You have a prompt. You find a case where you need it to perform better. So you add another sentence (or classification example, or whatever) to the prompt.
Did adding to the prompt improve *overall* performance?
Sometimes adding more instructions to a prompt helps the model perform better on specific cases without changing very much about overall performance. 👍
Sometimes adding more instructions to a prompt helps the model generalize better and actually improves overall performance. 👍👍
Sometimes adding more instructions to a prompt fixes a particular failure mode but makes a lot of other things worse. 👎
@brookenhopkins organized a hands-on voice agent workshop at the @covaldev office recently. We answered questions about building voice agents, evaluating their performance, and scaling into production usage.
Link to the full video below. But here's a clip of Brooke talking about the problem of evaluating changes to a voice AI system instruction. Especially a long system instruction. (Long prompts suffer more from jagged instrucion following performance than short prompts.)
Here's the full office hours Q&A.
https://t.co/HlpfDyvUzb
Some topics we covered:
- Latency, how to measure and optimize (15:17)
- Turn-taking and interruption handling (24:16, 25:17)
- Infrastructure trade-offs (16:15, 17:44)
- Prompt engineering and latency (40:47, 41:54)
- Observability and evaluation tooling (15:49, 22:57)