My hacked-together, messy, voice-based dev environment:

April 13, 2025

1. Voice-driven loop with screen-shotting so the LLM in the loop can see what's in my terminal and editor. The prompt varies depending on what I'm trying to drive with this loop.
2. A few tool definitions that give read access to files and URLs.
3. A tool the LLM can send a block of output to that generates keyboard events, so the LLM can drive any editor/terminal.
4. A separate process watching a directory and constantly making LLM-driven git commits. (git autosave).

I have some pieces of this running most of the time. But I'm lazy, and doing other stuff, and I also try to use a variety of editors and tools, to see what's good lately. Which ... no stability, so my hacked-together stuff is always broken.

I don't want to replace windsurf / cursor / claude code. A seriously good agent and expert-system dev toolkit is a *lot* of work.

What I want is a conversational voice layer that I can use with any dev environment, in the same way that I can use version control with any dev environment.

I don't have time to focus on this. But I can help and Pipecat has all the voice loop orchestration and model/service abstraction pieces. Who wants to build this?

Taelin@VictorTaelin

Wow, since when the 4o model on API is so fast? It seems really good too. Rarely worse than Sonnet, sometimes better. Feels so good to use it

BTW I swear I'm this close to never using an editor again. I just need a latency-free voice-based editor where I say something, the AI

Relatedely, every programmer should watch @swyx's OpenAI dev day talk from last year about programming as direct manipulation (with voice):

^[1]

And if you're interested in realtime voice AI hacking, read the guide and/or come to the meetup on Wednesday.

^[2]
^[3]

And see @trudypainter's voice coding experiments, live demos, and gemini-realtime-console code:

^[4]

^[5]

^[6]