You can see a similar effect with LLM finetunes. If you finetune a base model (or other instruct/finetune model) for a new task (e.g. better maths or programming language comprehension) it performs worse at other tasks like creative writing.
To mitigate this you have to include the other categories in your finetune training dataset so it doesn't lose the existing knowledge. Otherwise, the backpropagation and training will favour weights that reflect the new data.
In the game example having the weights optimized for game A doesn't help with game B. It would be interesting to see if training for both game A and B help it understand concepts in both.
Similarly with programming languages it would be interesting to see if training it with multiple languages if it can extract concepts like if statements and while loops.
IIUC from the observations with multilingual LLMs you need to have the different things you are supporting in the training set together. Then the current approach is able to identify similar concepts/patterns. It's not really learning these concepts but is learning that certain words often go together or that a word in one language is similar to another.
It would be interesting to study multilingual LLMs for their understanding of those languages in the case where the two languages are similar (e.g. Scottish and Irish Gaelic; Dutch and Afrikaans; etc.), are in the same language family (French, Spanish, Portuguese), or are in different language families (Italian, Japanese, Swahili), etc.
> In the game example having the weights optimized for game A doesn't help with game B. It would be interesting to see if training for both game A and B help it understand concepts in both.
Supposedly it does both A and B worse. That's their problem statement essentially. Current SOTA models don't behave like humans would. If you took a human that's really good at A and B, chances are they're gonna pick up C much quicker than a random person off the street that hasn't even seen Atari before. With SOTA models, the random "person" does better at C than the A/B master.