logoalt Hacker News

SXXyesterday at 8:36 PM1 replyview on HN

Hey, I just made simple test on 5 minute downloaded YouTube video uploading it to Gemini app.

Source video title: Zelda: Breath of the Wild - Opening five minutes of gameplay

https://www.youtube.com/watch?v=xbt7ZYdUXn8

Prompt:

   Please describe what happening in each scene of this video.
   
   List scenes with timestamp, then describe separately:
   - Setup and background, colors
   - What is moving, what appear
   - What objects in this scene and what is happening,
   
   Basically make desceiption of 5 minutes video for a person who cant watch it.
Result on github gist since there too much text:

https://gist.github.com/ArseniyShestakov/43fe8b8c1dca45eadab...

I'd say thi is quite accurate.


Replies

SXXyesterday at 8:50 PM

Another example with completely random 10 minute benchmark video from Tears of Kingdom:

https://gist.github.com/ArseniyShestakov/47123ce2b6b19a8e6b3...