Can you please explain "the transfer function is negative"?
I'm wondering whether one has tested with the same model but on two situations:
1) Bring it to superhuman level in game A and then present game B, which is similar to A, to it.
2) Present B to it without presenting A.
If 1) is not significantly better than 2) then maybe it is not carrying much "knowledge", or maybe we simply did not program it correctly.
I think the problem is we train models to pattern match, not to learn or reason about world models
According to Carmack's recent talk [0], SOTA models that have been trained on game A don't perform better or train faster on game B. Even worse, training on game B negatively affects performance in game A when returning to it.
[0] https://www.youtube.com/watch?v=3pdlTMdo7pY