Yes, the huge repository of raw materials is likely the hardest part. You can try crowdsourced collections ( https://tatoeba.org , https://datacollective.mozillafoundation.org/datasets?q=comm... , https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtit... ) but you'll quickly run into data quality issues. My personal solution is to do manual data curation on the fly, but I think an app that occasionally throws up garbage and asks its users to pick out the good parts is unlikely to get popular.
Maybe the free version of the app could do the collaborative filtering part. And in the paid version you'd get the high quality content.