logoalt Hacker News

maxlohtoday at 2:30 PM4 repliesview on HN

To my understanding, if the material is publicly available or obtained legally (i.e., not pirated), then training a model with it falls under fair use, at least in the US and some other jurisdictions.

If the training is established as fair use, the underlying license doesn't really matter. The term you added would likely be void or deemed unenforceable if someone ever brought it to a court.


Replies

rileymat2today at 2:50 PM

It depends on the license terms, if you have a license that allowed you to get it legally where you agreed to those terms it would not be legal for that purpose.

But this is all grey area… https://www.authorsalliance.org/2023/02/23/fair-use-week-202...

justin_murraytoday at 2:40 PM

This is at least murky, since a lot of pirated material is “publicly available”. Certainly some has ended up in the training data.

show 1 reply
colechristensentoday at 2:42 PM

I wouldn't say this is settled law, but it looks like this is one of the likely outcomes. It might not be possible to write a license to prevent training.

show 1 reply
LtWorftoday at 7:01 PM

Fair use was for citing and so on not for ripping off 100% of the content.

show 2 replies