I think there's no meaningful case by the letter of the law that use of training data that include GPL-licensed software in models that comprise the core component of modern LLMs doesn't obligate every producer of such models to make both the models and the software stack supporting them available under the same terms. Of course, it also seems clear in the present landscape that the law often depends more on the convenience of the powerful than its actual construction and intent, but I would love to be proven wrong about that, and this kind of outcome would help
I'm struggling to parse the double negative in that statement, haha.
Are you saying that you believe that untested but technically; models trained on GPL sources need to distribute the resulting LLMs under GPL?
Intellectual property never made much sense to begin with. But it certainly makes no sense now, where the common creator has no protections against greedy corporate giants who are happy to wield the full weight of the courts to stifle any competition for longer than we'll be alive.
Or, in the case of LLMs, recklessly swing about software they don't understand while praying to find a business model.
If the rise of Draft Kings and Polymarket/Kalshi have taught me anything, it's that the law becomes meaningless at scale. Sad.
If there was going to be a case, it's derivative works. [1]
What makes it all tricky for the courts is there's not a good way to really identify what part the generated code is a derivative of (except in maybe some extreme examples).
That's always what laws existed for, a law is just a formal way of saying "we will use violence against you if you do something we don't like" and that has always going to be primary written by and for the people that already have the power to do that, it's not the worst, certainly better than Kings just being able to do as they please.
> I think there's no meaningful case by the letter of the law that use of training data that include GPL-licensed software in models that comprise the core component of modern LLMs doesn't obligate every producer of such models to make both the models and the software stack supporting them available under the same terms.
Why do you think "fair use" doesn't apply in this case? The prior Bartz vs Anthropic ruling laid out pretty clearly how training an AI model falls within the realm of fair use. Authors Guild vs Google and Authors Guild vs HathiTrust were both decided much earlier and both found that digitizing copyrighted works for the sake of making them searchable is sufficiently transformative to meet the standards of fair use. So what is it about GPL licensed software that you feel would make AI training on it not subject to the same copyright and fair use considerations that apply to books?