Benchmarks on public tests are too easy to game. The model owners can just incorporate the answers i...

SchemaLoad • today at 3:34 AM • 1 reply • view on HN

Benchmarks on public tests are too easy to game. The model owners can just incorporate the answers in to the dataset. Only the private problems actually matter.

Replies

sanxiyn • today at 3:37 AM

In this case the code is public and you can see they are not cheating in that sense.

➕ show 4 replies

alt Hacker News

Replies