← kwindla hultman kramer

Thank you to @juliettelove29 for a fun fireside chat last night at the voice AI…

February 26, 2025

Thank you to @juliettelove29 for a fun fireside chat last night at the voice AI meetup in London. Lots of interesting background about Gemini's multimodal capabilities. And thoughtful advice about building multimodal agents in 2025.

Some notes:
- The @GoogleDeepMind team has been working on multimodal features and capabilities for a long time. What we see today in the model and the Multimodal Live API comes from lots of experience building actual applications. Some of these applications get released publicly, some are only used internally at Google.
- Model training and API design are evolving together. One way to think about this is that with each capability improvement, we are able to build for new use cases. New use cases generate new evals and training data. New evals and training data are used in the next cycle of model improvement and API evolution.

  1. https://x.com/kwindla/status/1894479551259897908