logoalt Hacker News

astrangeyesterday at 7:46 AM2 repliesview on HN

Scraping the internet isn't a copyright violation. Using it for LLM training is much more transformative than Google and Internet Archive, which are legal.


Replies

jazzyjacksonyesterday at 2:12 PM

Your right, scraping is legally protected. It's reproducing verbatim text that's a violation, which is why LLMs still clumsily refuse to produce song lyrics. They are capable of copyright violations and have to be 'aligned' not to get their providers sued.

show 1 reply
alfiedotwtfyesterday at 8:39 AM

To be honest, this is the first time someone has spelt it out in a nicely succinct paragraph.

And just like that, I totally agree with you

show 1 reply