well, they probably have quite a lot of text from high schoolers trying to meet the minimum word length on a take home essay in the training data