logoalt Hacker News

nightshift1yesterday at 7:43 AM1 replyview on HN

>which human

The second graph has this under it:

The length of tasks (measured by how long they take human professionals) that generalist frontier model agents can complete autonomously with 50% reliability has been doubling approximately every 7 months for the last 6 years...


Replies

twotwotwoyesterday at 8:08 AM

Yeah--I wanted a short way to gesture at the subsequent "tasks that are fast for someone but not for you are interesting," and did not mean it as a gotcha on METR, but I should've taken a second longer and pasted what they said rather than doing the "presumably a human competent at the task" handwave that I did.

show 1 reply