Claude Fable 5 beats Pokémon FireRed using only vision: https://www.youtube.com/watch?v=CIQBP1w4B1M
Any suggestion on how I should calibrate my cynicism towards this?
I can immagine Anthropic running this experiment multiple times and picking the most impressive one. Or I could immagine like this entire run costing like $1000+ of tokens for this particular run. Or maybe they tried a bunch of Pokemon games and it couldn't even finish some of them. Or is it just able to do this because it has an immense amount of FireRed training data, and if you were to give it an "original" Pokemon game, where it actually had to navigate novel circumstances it would fail.
no reasoning shown. no explanation on any training information. Using vision-only should be an easier version of the task (given training).
there are many standardized evals to do this correctly and Anthropic ignored them to provide a 18 second sped up video of a 50 hour run?
yeah I don't trust this until they provide a live run by a 3rd party with full reasoning traces in real-time. The reason we all liked the Gemini Plays Pokemon style runs were because they were live and couldn't be faked
Bold move putting in the lvl 3 Pidgey against Gary's Blastoise at the end there (~14sec in... integer timestamps insufficient here).
Is there any more detail about this besides the very fast slideshow?
The video is privated now, but the timelapse is weird. Sometimes it skips only seconds before the next screenshot and sometimes it skips probably hours forward.
"Computer system goes through a finite state machine"
I mean that’s AGI confirmed right?
[dead]
hi, pokemon red expert here: that video has since been taken private. there is a new what i would assume to be version of that video posted here https://www.youtube.com/watch?v=Ty_50J84fMY and heavily redacted with most of the game actually omitted. very possibly this is just another case of anthropic protecting us from their models' immense power