It still seems like something is missing from all these frameworks.
I feel like an average human wouldn't pass some of these metrics yet they are "generally intelligent". On the other hand they also wouldn't pass a lot of the expert questions that AI is good at.
We're measuring something, and I think optimizing it is useful, I'd even say it is "intelligent" in some ways, but it doesn't seem "intelligent" in the same way that humans are.
> I feel like an average human wouldn't pass some of these metrics yet they are "generally intelligent". On the other hand they also wouldn't pass a lot of the expert questions that AI is good at.
I think this approach is intentional. The philosophy is simply "extraordinary claims require extraordinary evidence". What you're saying is true, but producing a system that exhibits all human cognitive capabilities is a better threshold for the (absolutely wild) claim of the existence of AGI.
On the other hand, AI being very good at everything while select humans may only be very good at some things is likely also a quality we want to retain (or, well, achieve).
If a human cares about the work, they can often outperform an LLM because they will keep at it until the work meets their standard of quality. Whereas the LLM will guess and then wait to be corrected. As a recent tweet I saw said: it’s amazing how fast the software bottleneck went from writing code, to reviewing code.
I think we’ll need to split the concept of intelligence into the capacity to accomplish a task and the capacity to conceive and prompt a task. If the former is called “intelligence” then LLMs are intelligent.
But what then do we call the latter? I think the idea of an AI that can independently accomplish great things is where people talk about “general” intelligence. But I think we need a label more specific, that covers this idea that successful humans are not just good at doing things, they originate what should be done and are not easy to dissuade.