logoalt Hacker News

beefnugslast Monday at 7:47 PM0 repliesview on HN

yeahhhh why isnt there a training structure where you play 5000 games, and the reward function is based on doing well in all of them?

I guess its a totaly different level of control: instead of immediately choosing a certain button to press, you need to set longer term goals. "press whatever sequence over this time i need to do to end up closer to this result"

There is some kind of nested multidimensional thing to train on here instead of immediate limited choices