According to Carmack's recent talk [0], SOTA models that have been trained on game A don't perform better or train faster on game B. Even worse, training on game B negatively affects performance in game A when returning to it.
You can see a similar effect with LLM finetunes. If you finetune a base model (or other instruct/finetune model) for a new task (e.g. better maths or programming language comprehension) it performs worse at other tasks like creative writing.
To mitigate this you have to include the other categories in your finetune training dataset so it doesn't lose the existing knowledge. Otherwise, the backpropagation and training will favour weights that reflect the new data.
In the game example having the weights optimized for game A doesn't help with game B. It would be interesting to see if training for both game A and B help it understand concepts in both.
Similarly with programming languages it would be interesting to see if training it with multiple languages if it can extract concepts like if statements and while loops.
IIUC from the observations with multilingual LLMs you need to have the different things you are supporting in the training set together. Then the current approach is able to identify similar concepts/patterns. It's not really learning these concepts but is learning that certain words often go together or that a word in one language is similar to another.
It would be interesting to study multilingual LLMs for their understanding of those languages in the case where the two languages are similar (e.g. Scottish and Irish Gaelic; Dutch and Afrikaans; etc.), are in the same language family (French, Spanish, Portuguese), or are in different language families (Italian, Japanese, Swahili), etc.
You can see a similar effect with LLM finetunes. If you finetune a base model (or other instruct/finetune model) for a new task (e.g. better maths or programming language comprehension) it performs worse at other tasks like creative writing.
To mitigate this you have to include the other categories in your finetune training dataset so it doesn't lose the existing knowledge. Otherwise, the backpropagation and training will favour weights that reflect the new data.
In the game example having the weights optimized for game A doesn't help with game B. It would be interesting to see if training for both game A and B help it understand concepts in both.
Similarly with programming languages it would be interesting to see if training it with multiple languages if it can extract concepts like if statements and while loops.
IIUC from the observations with multilingual LLMs you need to have the different things you are supporting in the training set together. Then the current approach is able to identify similar concepts/patterns. It's not really learning these concepts but is learning that certain words often go together or that a word in one language is similar to another.
It would be interesting to study multilingual LLMs for their understanding of those languages in the case where the two languages are similar (e.g. Scottish and Irish Gaelic; Dutch and Afrikaans; etc.), are in the same language family (French, Spanish, Portuguese), or are in different language families (Italian, Japanese, Swahili), etc.