I assume it's a lack of care when RLing them. RL has a tendency to reinforce cheating when th...

Eridrus • today at 5:47 PM • 0 replies • view on HN

I assume it's a lack of care when RLing them.

RL has a tendency to reinforce cheating when the cheats are easier to find than the final solution.

So when making your RL environment, you need to spend a lot of effort on finding ways the model can cheat and penalizing them.

alt Hacker News