Don't you see the massive problem with requiring visual input? Are blind people not intelligent because they cannot solve ARC-AGI-3 without a "harness"?
A theoretical text-only superintelligent LLM could prove the Riemann hypothesis but fail ARC-AGI-3 and won't even be AGI according to this benchmark...
Well, it would be AGI if you could connect a camera to it to solve it, similar to how blind people would be able to solve it if you restored their eyesight. But if the lack of vision is a fundamental limitation of their architecture, then it seems more fair not to call them AGI.