European Portuguese is the 13th most populous language in Europe. Not that small, there are many other European languages in use that are much smaller.
https://en.wikipedia.org/wiki/List_of_languages_by_number_of...
What makes Portugal's situation unique is that it is a small population that is eclipsed in models by the bigger weights of the much bigger population of Brazil.
Yes, there are much smaller European countries, but those are generally the only source of truth for their specific language, so the context of a LLM query in that language steers the LLM towards facts from that country, for example, if I ask a big generic LLM something in Latvian then it most likely will answer something relevant to the context of Latvia. But Portugal, being the much smaller user of its language, have the somewhat unique problem that if I ask a generic model something in Portuguese it will probably answer something related to Brazil instead of Portugal.
Maybe the UK and Spain have somewhat similar struggles, but I suspect that none has it as bad as Portugal in that regard.
> European Portuguese is the 13th most populous language in Europe
that's not impressive
It is pretty small when considering content output. It is only 11 million people, and only a fraction of them will be writing something that could be used on training datasests. If you look at the countries by scientific contribution, for example [1], Portugal is on the 28th position, while Brazil is in 14th by more than double the number of contributions.
Don't get me wrong, it is definitely impressive given Portugal's actual size, but I believe there's a hard limit for population and size that will be difficult to cross
[1]: https://en.wikipedia.org/wiki/List_of_countries_by_number_of...