> The language they do best with is the one with the largest corpus in the training set.
Up to a point, I guess? There must be a point of diminishing returns based on the expressiveness of the language
I mean, a language that has 8 different ways to declare + initialise composite variables needs to have a much larger training corpus than a language that has only 2 or 3 different ways.
The more expressive a language, the more different suitable patterns would be required, which results in a larger corpus being needed.