logoalt Hacker News

ejpiryesterday at 8:58 PM1 replyview on HN

those are not verified. I've tried forgecode and I cannot believe they didn't do something to influence the benchmarks


Replies

GodelNumberingyesterday at 9:07 PM

Yup, they were found to be sneaking the answer key using agents.md

https://debugml.github.io/cheating-agents/#sneaking-the-answ...