This is a very good estimation of AGI. We give humans and AI the same input and measure the results. Kudos to ARC for creating these games.
I really wonder why so many people fight against this. We know that AI is useful, we know that AI is researchful, but we want to know if they are what we vaguely define as intelligence.
I’ve read the airplanes don’t use wings, or submarines don’t swim. Yes, but this is is not the question. I suggest everyone coming up with these comparisons to check their biases, because this is about Artificial General Intelligence.
General is the keyword here, this is what ARC is trying to measure. If it’s useful or not. Isn’t the point. If AI after testing is useful or not isn’t the point either.
This so far has been the best test.
And I also recommend people to ask AI about specialized questions deep in your job you know the answer to and see how often the solution is wrong. I would guess it’s more likely that we perceive knowledge as intelligence than missing intelligence. Probably commom amongst humans as well.
The thing is.. this is more akin to testing a blind person's performance on a driving test than testing his intelligence.
I would imagine if you simply encoded the game in textual format and asked an LLM to come up with a series of moves, it would beat humans.
The problem here is more around perception than anything.
Previous iterations of ARC-AGI were reminiscent of IQ tests. This one is just too easy and the fact that models do terribly bad on it probably means that there is input mode mismatch or operation mode mismatch.
If model creators are willing to teach their llms to play computer games through text it's gonna be solved in one minor bump of the model version. But honestly, I don't think they are gonna bother because it's just too stilly and they won't expect their models are going to learn anything useful from that.
Especially since there are already models that can learn how to play 8-bit games.
It feels like ARC-AGI jumped the shark. But who knows, maybe people who train models for robots are going to take it in stride.
AGI’s 'general' is the wrong word, I thinkg. Humans aren’t general, we’re jagged. Strong in some areas, weak in others, and already surpassed in many domains.
LLM are way past us at languages for instance. Calculators passed us at calculating, etc.