It indicates that there's a good chance that they have trained on the test set, making the eval...

sdenton4 • yesterday at 11:10 PM • 0 replies • view on HN

It indicates that there's a good chance that they have trained on the test set, making the eval scores useless. Even if you have given up on the dream of generalization entirely, you can't meaningfully compare models which have trained on test to those which have not.

alt Hacker News