logoalt Hacker News

yorwbayesterday at 9:46 AM1 replyview on HN

Yes, the huge repository of raw materials is likely the hardest part. You can try crowdsourced collections ( https://tatoeba.org , https://datacollective.mozillafoundation.org/datasets?q=comm... , https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtit... ) but you'll quickly run into data quality issues. My personal solution is to do manual data curation on the fly, but I think an app that occasionally throws up garbage and asks its users to pick out the good parts is unlikely to get popular.


Replies

ameliusyesterday at 10:35 AM

Maybe the free version of the app could do the collaborative filtering part. And in the paid version you'd get the high quality content.

show 1 reply