You have a prompt. You find a case where you need it to perform better. So you…

July 11, 2025

You have a prompt. You find a case where you need it to perform better. So you add another sentence (or classification example, or whatever) to the prompt.

Did adding to the prompt improve *overall* performance?

Sometimes adding more instructions to a prompt helps the model perform better on specific cases without changing very much about overall performance. 👍

Sometimes adding more instructions to a prompt helps the model generalize better and actually improves overall performance. 👍👍

Sometimes adding more instructions to a prompt fixes a particular failure mode but makes a lot of other things worse. 👎

@brookenhopkins organized a hands-on voice agent workshop at the @covaldev office recently. We answered questions about building voice agents, evaluating their performance, and scaling into production usage.

Link to the full video below. But here's a clip of Brooke talking about the problem of evaluating changes to a voice AI system instruction. Especially a long system instruction. (Long prompts suffer more from jagged instrucion following performance than short prompts.)

Here's the full office hours Q&A.

https://t.co/HlpfDyvUzb

Some topics we covered:

- Latency, how to measure and optimize (15:17)
- Turn-taking and interruption handling (24:16, 25:17)
- Infrastructure trade-offs (16:15, 17:44)
- Prompt engineering and latency (40:47, 41:54)
- Observability and evaluation tooling (15:49, 22:57)

https://www.youtube.com/watch?v=ltqnlc-Np_8 ↩