Scraping the internet isn't a copyright violation. Using it for LLM training is much more transformative than Google and Internet Archive, which are legal.
To be honest, this is the first time someone has spelt it out in a nicely succinct paragraph.
And just like that, I totally agree with you
Your right, scraping is legally protected. It's reproducing verbatim text that's a violation, which is why LLMs still clumsily refuse to produce song lyrics. They are capable of copyright violations and have to be 'aligned' not to get their providers sued.