In other words, they learn the game, not how to play games.
Well yeah... If you only ever played one game in your life you would probably be pretty shit at other games too. This does not seem very revealing to me.
yeahhhh why isnt there a training structure where you play 5000 games, and the reward function is based on doing well in all of them?
I guess its a totaly different level of control: instead of immediately choosing a certain button to press, you need to set longer term goals. "press whatever sequence over this time i need to do to end up closer to this result"
There is some kind of nested multidimensional thing to train on here instead of immediate limited choices
They memorize the answers not the process to arrive at answers