logoalt Hacker News

pull_my_finger04/03/20251 replyview on HN

Ok but you're not "remembering what they say", you're creating a "derivative work" by literally just tokenizing/vectorizing (I'm not a data scientist or AI expert) the words as they appear exactly. AI doesn't innovate based on works it consumes, and it doesn't understand "concepts" picked up from it. It simply adds the possibility of regurgitating (read plagiarizing) verbatim or part or whole to a list of other possibilities. This is on top of the fact that these parasites didn't even ask to use or purchase the works to begin with, they stole (pirated) them.


Replies

archontes04/03/2025

It's a stretch to call training an AI creating a 'derivative work' by the legal definition.

You could count the words in a book and publish the word count, and while the information is based on the contents of the book, that would fall incredibly short of being a derivative work.

I suspect they committed whatever copyright violation is committed when they downloaded the copyrighted works. Training an AI on them is simply not related to the protections that copyright offers.