My sense is that a powerful enough AI would have the sense to think something like "ah, this sounds like a video game! Let me code up an interactive GUI, test it for myself, then use it to solve these puzzles..." and essentially self-harness (the way you would if you were reading a geometry problem, by drawing it out on paper).
Yeah but thats literally above ASI, let alone AGI. Average human scores <1% on this bench, opus scores 97.1% when given an actual vision access, which means agi was long ago achieved