What's "exponential" about AI development?
The METR task-completion time horizons, for one.
https://metr.org/time-horizons/
Lousy benchmark, they explicitly focus on the easiest tasks to automate for AI (i.e. heavily cherry picked outcomes) and it seems that they don't bother to test anything except just-released proprietary models.
Lousy benchmark, they explicitly focus on the easiest tasks to automate for AI (i.e. heavily cherry picked outcomes) and it seems that they don't bother to test anything except just-released proprietary models.