logoalt Hacker News

com2kidtoday at 4:11 AM0 repliesview on HN

The book archives are a big one as well, all the journals that have been published digitally throughout the 2000s, and all the newspapers.

Though with some types of models (specifically voice) it has been discovered that a smaller high quality dataset is better than a giant dataset filled with errors.