logoalt Hacker News

charcircuityesterday at 7:53 PM2 repliesview on HN

I always assumed that these benchmarks would happen in a sandbox. I'm surprised that no one realized this sooner.


Replies

reveltoday at 3:33 AM

Running benchmarks at scale and protecting against reward hacking is non-trivial.

ModernMechyesterday at 7:59 PM

I'm surprised anyone took them seriously in the first place.

show 3 replies