logoalt Hacker News

phplovesongtoday at 2:16 PM8 repliesview on HN

We need a new license that forbids all training. That is the only way to stop big corporations from doing this.


Replies

maxlohtoday at 2:30 PM

To my understanding, if the material is publicly available or obtained legally (i.e., not pirated), then training a model with it falls under fair use, at least in the US and some other jurisdictions.

If the training is established as fair use, the underlying license doesn't really matter. The term you added would likely be void or deemed unenforceable if someone ever brought it to a court.

show 4 replies
tensortoday at 8:24 PM

So if you put this hypothetical license on spam emails, then spam filters can't train to recognize them? I'm sure ad companies would LOVE it.

mr_toadtoday at 7:39 PM

Fair use doesn’t need a license, so it doesn’t matter what you put in the license.

Generally speaking licenses give rights (they literally grant license). They can’t take rights away, only the legislature can do that.

WithinReasontoday at 2:28 PM

Wouldn't it be still legal to train on the data due to fair use?

show 1 reply
munchlertoday at 2:43 PM

By that logic, humans would also be prevented from “training” on (i.e. learning from) such code. Hard to see how this could be a valid license.

show 2 replies
BeFlatXIIItoday at 5:20 PM

How is that enforceable against the fly-by-night startups?

James_Ktoday at 2:21 PM

Would such a license fall under the definition of free software? Difficult to say. Counter-proposition: a license which permits training if the model is fully open.

show 3 replies
scotty79today at 2:31 PM

We need a ruling that LLM generated code enters public domain automatically and can't be covered by any license.

show 3 replies