Yeah I'm quite surprised as to how all of those are supposed to be considered problems. They al...

girvo • yesterday at 10:30 PM • 3 replies • view on HN

Yeah I'm quite surprised as to how all of those are supposed to be considered problems. They all make sense to me if we're trying to judge whether these tools are AGI, no?

Replies

naasking • today at 2:14 PM

> They all make sense to me if we're trying to judge whether these tools are AGI, no?

As long as the mean and median human scores are clearly communicated, the scoring is fine. I think the human scores above would surprise people at first glance, even if they make sense once you think about it, so there's an argument to be made that scores can be misleading.

andy12_ • yesterday at 10:42 PM

I think that any logic-based test that your average human can "fail" (aka, score below 50%) is not exactly testing for whether something is AGI or not. Though I suppose it depends on your definition of AGI (and whether all humans, or at least your average human, is considered AGI under that definition).

➕ show 1 reply

benjaminl • yesterday at 11:56 PM

This issue here is that people have different definitions of AGI. From the description. Getting 100% on this benchmark would be more than AGI and would qualify for ASI (Algorithmic Super Intelligence) not just AGI.

➕ show 3 replies

alt Hacker News

Replies