logoalt Hacker News

swiftcodertoday at 4:00 PM2 repliesview on HN

It is definitely an interesting problem, because Portugal is a small enough country that the actual total corpus of available texts in (non-Brazilian) Portuguese is potentially problematic.


Replies

fy20today at 5:25 PM

European Portuguese is the 13th most populous language in Europe. Not that small, there are many other European languages in use that are much smaller.

https://en.wikipedia.org/wiki/List_of_languages_by_number_of...

show 3 replies
embedding-shapetoday at 4:13 PM

I don't think so, Portugal the country might be small, with a small population, but there is ~250 million "Lusophones" (native Portuguese speakers), making it the fifth-most spoken native language in the world, I'd hardly call that small :) And before everyone screams; yes, European Portuguese is different from Brazilian Portuguese, but they're still both Portuguese and understand each other, so it's not like the text from one cannot be used to train a model for the other, or vice-versa.

All in all, I don't think that's a major issue here.

show 5 replies