There's no shortage of benchmarks (coding or otherwise) that any competent coding model will no...

SEMW • yesterday at 5:50 PM • 1 reply • view on HN

There's no shortage of benchmarks (coding or otherwise) that any competent coding model will now pass with ~100%.

But no-one quotes those any more because if everyone passes them, they don't serve any useful purpose in discriminating between different models or identifying advancements

So people switch to new benchmarks which either have more difficult tasks or some other artificial constraints that make them in some way harder to pass, until the scores are low enough that they're actually discriminating between models. and a 50% score is in some sense ideal for that - there's lots of room for variance around 50%.

(whether the thing they're measuring is something that well correlates to real coding performance is another question)

So you can't infer anything in isolation from a given benchmark score being only 50% other than that benchmarks are calibrated to make such scores the likely outcome

Replies

crustycoder • yesterday at 8:33 PM

So it's the relative and not the absolute diff that matters - thanks.

alt Hacker News

Replies