This exploiting of benchmarks isn't that interesting to me since it would be obvious. The main ...

davebren • today at 12:35 AM • 1 reply • view on HN

This exploiting of benchmarks isn't that interesting to me since it would be obvious. The main way I assume they're gaming the benchmarks is by creating training data that closely matches the test data, even for ARC where the test data is secret.

Replies

jmalicki • today at 12:41 AM

They said they used things like submitted a `conftest.py` - e.g. what would be considered very blatant cheating, not just overfitting/benchmaxxing. Did you read the AI slop in the post?

This is basically a paper about security exploits for the benchmarks. This isn't benchmark hacking like having hand coded hot paths for a microbenchmarks, this is hacking like modifying the benchmark computation code itself at runtime.

➕ show 2 replies

alt Hacker News

Replies