logoalt Hacker News

iamjackgyesterday at 7:25 PM3 repliesview on HN

Curious how this will fare when playing Pokemon Red.


Replies

dansoyesterday at 11:10 PM

> 3. Turning long videos into action: Gemini 3 Pro bridges the gap between video and code. It can extract knowledge from long-form content and immediately translate it into functioning apps or structured code

I'm curious as to how close these models are to achieving that once long-ago mocked claim (by Microsoft I think?) that AIs could view gameplay video of long lost games and produce the code to emulate them.

minimaxiryesterday at 7:39 PM

Gemini 3 Pro has been playing Pokemon Crystal (which is significantly harder than Red) in a race against Gemini 2.5 Pro: https://www.twitch.tv/gemini_plays_pokemon

Gemini 3 Pro has been making steady progress (12/16 badges) while Gemini 2.5 Pro is stuck (3/16 badges) despite using double the turns and tokens.

show 1 reply
euvinyesterday at 7:29 PM

Yeah the "High frame rate understanding" feature caught my eye, actual real time analysis of live video feeds seems really cool. Also wondering what they mean by "video reasoning/thinking"?

show 1 reply