May 20, 2025
Sitting at Google I/O thinking about 2 things.
The platform shift we're all living through the early days of. This is maybe most obvious lately in the early glimpses of what programming is going to look like in two or three years. (Or maybe sooner.) I was kicking off Codex tasks and then switching tabs to read the Jules documentation.
Whither programming, all other things we do with computers will follow. Long-running "agents" with rich state and access to all our digital tools and filesystems. It's clear that we don't even have the right words to talk about this yet, or the building blocks for these new computing artifacts. "Agents" is a place-holder for things yet to be invented.
We'll interact with these long-running processes in a bunch of different ways. (Or maybe, they'll interact with us in a bunch of different ways. Which brings us to ...
Voice as the universal interface. Many of the demos today featured voice in some way. Native audio input and output in Gemini 2.5 Flash and Pro. Live translation. UI for wearable XR glasses. Voice control in the Gemini consumer app. Voice agents and conversational voice apps in AI Studio and new tools in the Multimodal Live API. Async function calling for voice conversations.