Do you have a reason to believe this ain't already being done? I would assume that the big guys like openai are already training on basically all text in existence.
Wasn't this confirmed what Meta does?
https://www.forbes.com/sites/danpontefract/2025/03/25/author...
In fact, facebook torrented annas archive and got busted for it, because of course they did:
https://torrentfreak.com/meta-torrented-over-81-tb-of-data-t...