logoalt Hacker News

louiereedersontoday at 7:09 PM0 repliesview on HN

I know they acknowledge this but measuring autonomy by looking at task length of the 99.9th percentile of users is problematic. They should not be using the absolute extreme tail of usage as an indication of autonomy, it seems disingenuous. Does it measure capability, or just how extreme users use Claude? It just seems like data mining.

The fact that there is no clear trend in lower percentiles makes this more suspect to me.

If you want to control for user base evolution given the growth they've seen, look at the percentiles by cohort.

I actually come away from this questioning the METR work on autonomy.

You can see the trend for other percentiles at the bottom of this, which they link to in the blog post https://cdn.sanity.io/files/4zrzovbb/website/5b4158dc1afb211...