logoalt Hacker News

bodegajedyesterday at 11:59 PM0 repliesview on HN

it is like reward hacking, where the reward function in this case the test is exploited to achieve its goals. it wants to declare victory and be rewarded so the tests are not critical to the code under test. This is probably in the RL pre-training data, I am of course merely speculating.