> Yeah, these benchmarks are bogus. It's not just over-fitting to leading benchmarks, ther...

mrandish • yesterday at 7:34 PM • 1 reply • view on HN

> Yeah, these benchmarks are bogus.

It's not just over-fitting to leading benchmarks, there's also too many degrees of freedom in how a model is tested (harness, etc). Until there's standardized documentation enabling independent replication, it's all just benchmarketing .

Replies

fooker • yesterday at 7:48 PM

For the current state of AI, the harness is unfortunately part of the secret sauce.

alt Hacker News

Replies