logoalt Hacker News

shkkmolast Friday at 7:44 AM2 repliesview on HN

> If I had a photographic memory and I used it to replicate parts of GPLed software verbatim while erasing the license, I could not excuse it in court that I simply "learned from" the examples.

Right, because you would have done more than learning, you would have then gone past learning and used that learning to reproduce the work.

It works exactly the same for a LLM. Training the model on content you have legal access to is fine. Aftwards, somone using that model to produce a replica of that content is engaged in copyright enfringement.

You seem set on conflating the act of learning with the act of reproduction. You are allowed to learn from copyrighted works you have legal access to, you just aren't allowed to duplicate those works.


Replies

sirwhinesalotlast Friday at 7:57 AM

The problem is that it's not the user of the LLM doing the reproduction, the LLM provider is. The tokens the LLM is spitting out are coming from the LLM provider. It is the provider that is reproducing the code.

If someone hires me to write some code, and I give them GPLed code (without telling them it is GPLed), I'm the one who broke the license, not them.

show 2 replies
zephenlast Friday at 7:57 AM

You seem set on conflating "training" an LLM with "learning" by a human.

LLMs don't "learn" but they _do_ in some cases, faithfully regurgitate what they have been trained on.

Legally, we call that "making a copy."

But don't take my word for it. There are plenty of lawsuits for you to follow on this subject.

show 1 reply