it is like reward hacking, where the reward function in this case the test is exploited to achieve i...

bodegajed • yesterday at 11:59 PM • 0 replies • view on HN

it is like reward hacking, where the reward function in this case the test is exploited to achieve its goals. it wants to declare victory and be rewarded so the tests are not critical to the code under test. This is probably in the RL pre-training data, I am of course merely speculating.

alt Hacker News