logoalt Hacker News

twotwotwoyesterday at 8:08 AM1 replyview on HN

Yeah--I wanted a short way to gesture at the subsequent "tasks that are fast for someone but not for you are interesting," and did not mean it as a gotcha on METR, but I should've taken a second longer and pasted what they said rather than doing the "presumably a human competent at the task" handwave that I did.


Replies

nightshift1yesterday at 9:32 AM

I agree. After all, benchmarks don't mean much, but I guess they are fine as long as they keep measuring the same thing every time. Also, the context matter. In my case, I see a huge difference between the gains at work vs those at home on a personal project where I don't have to worry about corporate policies, security, correctness, standards, etc. I can let the LLM fly and not worry about losing my job in record time.