Hoisting this out of a discussion with @cairns

June 13, 2025

Hoisting this out of a discussion with @cairns ...

I've been experimenting as much as I can with Gemini model video understanding. All the 2.0 and 2.5 series Gemini models are impressive! The new Gemini 2.5 Pro, particularly so, as you might expect. It handles multiple videos, multi-step prompts, etc, and consistently delivers coherent and useful results.

For example, I now use this prompt when I've given a talk or recorded a long demo, and I want to post about it.

----

Analyze this YouTube video.

[ video url ]

1. Pick three interesting segments that are 2 minutes long or shorter, for posting on social media.
2. Summarize each segment and explain why it's interesting. 3.Transcribe each segment.
3. I let Gemini Pro do its "automatic thinking budget" thing. A prompt like this typically takes 90 seconds or so to start streaming its (post-thinking-stage) response.

Sometimes I already know which segment I want to pull out and post. Sometimes I don't. But in either case, it's useful to run this prompt and see the output.

A good next step would be to automate the workflow entirely. Ideally with me in the loop to approve and lightly guide it.

@DescriptApp, which I use for editing and captioning my demo videos, can actually do some of this really well already. But I like comparing different models and tools. And Gemini 2.5 Pro is the best model at this analysis task, right now, I think.

Here's an example of Gemini's response for the above prompt: https://t.co/o2dZhUNanE