logoalt Hacker News

cyanydeezyesterday at 5:21 PM3 repliesview on HN

ok, but arn't you just measuring efficiency and not the big I in AGI improvements.


Replies

Leynosyesterday at 7:33 PM

It also measures task coherence—ability to plan, form contingencies, recover from errors, mitigate accumulation of errors, and reconcile findings across a long context window.

jsnellyesterday at 5:56 PM

No? I think you're misunderstanding what is being measured.

It is purely a test of capabilities (can it do a thing that takes a human $X hours), not efficiency (how fast will it do it).

lukanyesterday at 5:27 PM

Yes, but this study was not about that and "just efficiency" is actually what most people are after.

At least I want AI to solve my problems, not score high on a academic leaderboard.