logoalt Hacker News

forgotpwd16today at 7:38 AM2 repliesview on HN

Unless misread, 2 hours isn't the time limit for the candidate to do this but the time Claude eventually needed to outperform best returned solution. Best candidate could've taken 6h~2d to achieve this result.


Replies

fhd2today at 8:07 AM

Their Readme.md is weirdly obsessed with "2 hours":

"before Claude Opus 4.5 started doing better than humans given only 2 hours"

"Claude Opus 4.5 in a casual Claude Code session, approximately matching the best human performance in 2 hours"

"Claude Opus 4.5 after 2 hours in our test-time compute harness"

"Claude Sonnet 4.5 after many more than 2 hours of test-time compute"

So that does make one wonder where this comes from. Could just be LLM generated with a talking point of "2 hours", models can fall in love with that kind of stuff. "after many more than 2 hours" is a bit of a tell.

Would be quite curious to know though. How I usually design take home assignments is:

1. Candidate has several _days_ to complete (usually around a week).

2. I design the task to only _take_ 2-4 hours, informing the candidate about that, but that doesn't mean they can't take longer. The subsequent interview usually reveals if they went overboard or struggled more than expected.

But I can easily picture some places sending a candidate the assignment and asking them to hand in their work within two hours. Similar to good old coding competitions.

alcasatoday at 8:33 AM

No the 2 hours is their time limit for candidates. The thing is that you are allowed to use any non-human help for their take homes (open book), so if AI can solve it in below 2 hours, it's not very good at assessing the human.

show 1 reply