April 13, 2025
My hacked-together, messy, voice-based dev environment:
1. Voice-driven loop with screen-shotting so the LLM in the loop can see what's in my terminal and editor. The prompt varies depending on what I'm trying to drive with this loop.
2. A few tool definitions that give read access to files and URLs.
3. A tool the LLM can send a block of output to that generates keyboard events, so the LLM can drive any editor/terminal.
4. A separate process watching a directory and constantly making LLM-driven git commits. (git autosave).
I have some pieces of this running most of the time. But I'm lazy, and doing other stuff, and I also try to use a variety of editors and tools, to see what's good lately. Which ... no stability, so my hacked-together stuff is always broken.
I don't want to replace windsurf / cursor / claude code. A seriously good agent and expert-system dev toolkit is a *lot* of work.
What I want is a conversational voice layer that I can use with any dev environment, in the same way that I can use version control with any dev environment.
I don't have time to focus on this. But I can help and Pipecat has all the voice loop orchestration and model/service abstraction pieces. Who wants to build this?
Wow, since when the 4o model on API is so fast? It seems really good too. Rarely worse than Sonnet, sometimes better. Feels so good to use it
BTW I swear I'm this close to never using an editor again. I just need a latency-free voice-based editor where I say something, the AI
Relatedely, every programmer should watch @swyx's OpenAI dev day talk from last year about programming as direct manipulation (with voice):
And if you're interested in realtime voice AI hacking, read the guide and/or come to the meetup on Wednesday.
And see @trudypainter's voice coding experiments, live demos, and gemini-realtime-console code: