The @thursdai_pod year-end wrap + 2025 predictions show today was great

December 27, 2024

The @thursdai_pod year-end wrap + 2025 predictions show today was great.

Thanks to @altryne, @WolframRvnwlf, @nisten and others who make the show happen every week.

Here were my predictions.

1. I agree with everyone who said that multimodal UIs and multimodal agents will be the focus of a lot of interesting low-level work and app-level experimentation by AI engineers. We're going to see new things, some of which will change how we think about AI (and computing in general).

2. In 2025 we'll get at least one production-quality speech-to-speech model that is built on a hybrid/post-transformer foundation. "When will we see an important architecture evolution beyond the transformer" is a perennial question. 2025 is when.

3. Code execution as implemented by Gemini will influence the training of all future SOTA models.

If you're interested in model architecture or speech applications, it's worth reading @cartesia_ai's State of Voice AI in 2024 post.

https://t.co/85V7g8RgL6

The Moshi paper from @kyutai_labs was one of my favorite papers this year. Moshi is an impressive experimental speech-to-speech model with several really interesting architecture innovations.

https://t.co/1EOAY9xLm7

I wrote a little bit about Gemini 2.0's code execution here:

https://t.co/hKCUJ2FlzK

https://www.cartesia.ai/blog/state-of-voice-ai-2024 ↩