logoalt Hacker News

naaskingtoday at 2:14 PM0 repliesview on HN

> They all make sense to me if we're trying to judge whether these tools are AGI, no?

As long as the mean and median human scores are clearly communicated, the scoring is fine. I think the human scores above would surprise people at first glance, even if they make sense once you think about it, so there's an argument to be made that scores can be misleading.