The first really large text data I ever encountered was Google Ngram[0], the total size of which is about 3TB. I would have guessed it was closer to 3GB before I started downloading it.
[0] https://storage.googleapis.com/books/ngrams/books/datasetsv3...
Yes. I love Google Ngrams.
We use the top google Ngrams in 2 ways. (a) we share it in the reference mode of our app, i.e. common words before or after; (b) we use longer N-grams, where possible, like a 4-gram, to choose literary examples that also show a common pattern.