> Just like stealing fractional amounts of money[3] should not be legal, violating the licenses of the training data by reusing fractional amounts from each should not be legal either.
I think you'll find that this is not settled in the courts, depending on how the data was obtained. If the data was obtained legally, say a purchased book, courts have been finding that using it for training is fair use (Bartz v. Anthropic, Kadrey v. Meta).
Morally the case gets interesting.
Historically, there was no such thing as copyright. The English 1710 Statute of Anne establishing copyright as a public law was titled 'for the Encouragement of Learning' and the US Constitution said 'Congress may secure exclusive rights to promote the progress of science and useful arts'; so essentially public benefits driven by the grant of private benefits.
The Moral Bottomline: if you didn't have to eat, would you care about who copies your work as long as you get credited?
The more the people that copy your work with attribution, the more famous you'll be. Now that's the currency of the future*. [1]
You'll do it for the kudos. [2][3]
*Post-Scarcity Future.
[1] https://en.wikipedia.org/wiki/Post-scarcity
[2] https://en.wikipedia.org/wiki/The_Quiet_War, et. al.
[3] https://en.wikipedia.org/wiki/Accelerando
> The Moral Bottomline: if you didn't have to eat, would you care about who copies your work as long as you get credited?
Yes.
I have 2 issues with "post-scarcity":
- It often implicitly assumes humanity is one homogeneous group where this state applies to everyone. In reality, if post-scarcity is possible, some people will be lucky enough to have the means to live that lifestyle while others will still by dying of hunger, exposure and preventable diseases. All else being equal, I'd prefer being in the first group and my chance for that is being economically relevant.
- It often ignores that some people are OK with having enough while others have a need to have more than others, no matter how much they already have. The second group is the largest cause of exploitation and suffering in the world. And the second group will continue existing in a post-scarcity world and will work hard to make scarcity a real thing again.
---
Back to your question:
I made the mistake of publishing most of my public code under GPL or AGPL. I regret is because even though my work has brought many people some joy and a bit of my work was perhaps even useful, it has also been used by people who actively enjoy hurting others, who have caused measurable harm and who will continue causing harm as long as they're able to - in a small part enabled by my code.
Permissive licenses are socially agnostic - you can use the work and build on top of it no matter who you are and for what purpose.
A(GPL) is weakly pro-social - you can use the work no matter what but you can only build on top of it if you give back - this produces some small but non-zero social pressure (enforced by violence through governments) which benefits those who prefer cooperation instead of competition.
What I want is a strongly pro-social license - you can use or build on top of my work only if you fulfill criteria I specify such as being a net social good, not having committed any serious offenses, not taking actions to restrict other people's rights without a valid reason, etc.
There have been attempts in this direction[0] but not very successful.
In a world without LLMs, I'd be writing code using such a license but more clearly specified, even if I had to write my own. Yes, a layer would do a better job, that does not mean anything written by a non-lawyer is completely unenforceable.
With LLMs, I have stopped writing public code at all because the way I see it, it just makes people much richer than me even richer at a much faster rate than I can ever achieve myself. Ir just makes inequality worse. And with inequality, exploitation and oppression tends to soon follow.
[0]: https://json.org/license.html