We need a new license that forbids all training. That is the only way to stop big corporations from ...

phplovesong • today at 2:16 PM • 8 replies • view on HN

We need a new license that forbids all training. That is the only way to stop big corporations from doing this.

Replies

To my understanding, if the material is publicly available or obtained legally (i.e., not pirated), then training a model with it falls under fair use, at least in the US and some other jurisdictions.

If the training is established as fair use, the underlying license doesn't really matter. The term you added would likely be void or deemed unenforceable if someone ever brought it to a court.

➕ show 4 replies

tensor • today at 8:24 PM

So if you put this hypothetical license on spam emails, then spam filters can't train to recognize them? I'm sure ad companies would LOVE it.

mr_toad • today at 7:39 PM

Fair use doesn’t need a license, so it doesn’t matter what you put in the license.

Generally speaking licenses give rights (they literally grant license). They can’t take rights away, only the legislature can do that.

WithinReason • today at 2:28 PM

Wouldn't it be still legal to train on the data due to fair use?

➕ show 1 reply

munchler • today at 2:43 PM

By that logic, humans would also be prevented from “training” on (i.e. learning from) such code. Hard to see how this could be a valid license.

➕ show 2 replies

BeFlatXIII • today at 5:20 PM

How is that enforceable against the fly-by-night startups?

James_K • today at 2:21 PM

Would such a license fall under the definition of free software? Difficult to say. Counter-proposition: a license which permits training if the model is fully open.

➕ show 3 replies

scotty79 • today at 2:31 PM

We need a ruling that LLM generated code enters public domain automatically and can't be covered by any license.

➕ show 3 replies

alt Hacker News

Replies