logoalt Hacker News

jakkostoday at 11:21 AM8 repliesview on HN

> Pre-training is, actually, our collective gift

I feel like this wording isn't great when there are many impactful open source programmers who have explicitly stated that they don't want their code used to train these models and licensed their work in a world where LLMs didn't exist. It wasn't their "gift", it was unwillingly taken from them.

> I'm a programmer, and I use automatic programming. The code I generate in this way is mine. My code, my output, my production. I, and you, can be proud.

I've seen LLMs generate code that I have immediately recognized as being copied a from a book or technical blog post I've read before (e.g. exact same semantics, very similar comment structure and variable names). Even if not legally required, crediting where you got ideas and code from is the least you can do. While LLMs just launder code as completely your own.


Replies

jll29today at 12:48 PM

> there are many impactful open source programmers who have explicitly stated that they don't want their code used to train these models and licensed their work in a world where LLMs didn't exist. It wasn't their "gift", it was unwillingly taken from them.

There are subtle legal differences between "free open source" licensing and putting things in the public domain. If you use an open source license, you could forbid LLM training (in licensing law, contrary to all other areas of law, anything that is not granted to licensees is forbidden). Then you can take the big guys (MSFT, Meta, OpenAI, Google) to court if you can demonstrate they violated your terms.

If you place your software into the public domain, any use is fair, including ways to exploit the code or its derivatives not invented at the time of release.

Curiosly, doesn't the GPL even imply that if you pre-tain an LLM with GPLed code and use it to generate code (Claude Code etc.) that all generated code -- as derived intellectual property that it clearly is -- must also be open sourced as per GPL terms? (It would seem in the spirit of the licensors.) Haven't seen this raised or discussed anywhere yet.

show 2 replies
yuvadamtoday at 11:28 AM

I don't think it's possible to separate any open source contribution from the ones that came before it, as we're all standing on the shoulders of giants. Every developer learns from their predecessors and adapts patterns and code from existing projects.

show 4 replies
p-e-wtoday at 11:29 AM

> I feel like this wording isn't great when there are many impactful open source programmers who have explicitly stated that they don't want their code used to train these models

That’s been the fate of many creators since the dawn of time. Kafka explicitly stated that he wanted his works to be burned after his death. So when you’re reading about Gregor’s awkward interactions with his sister, you’re literally consuming the private thoughts of a stranger who stated plainly that he didn’t want them shared with anyone.

Yet people still talk about Kafka’s “contribution to literature” as if it were otherwise, with most never even bothering to ask themselves whether they should be reading that stuff at all.

show 1 reply
vbezhenartoday at 12:12 PM

Intellectual property is not absolute and can be expropriated, just like any other property.

show 1 reply
sneaktoday at 12:24 PM

If you publish your code to others under permissive licenses, people using it to do things you do not want is not something being unwillingly taken from you.

You can do whatever you want with a gift. Once you release your code as free software, it is no longer yours. Your opinions about what is done with it are irrelevant.

show 1 reply
hjoutfbkfdtoday at 11:33 AM

when you inplement a quick sort, do you credit Hoare in the comments?

show 3 replies
bkotoday at 12:06 PM

I don't understand this perspective. Programmers often scoff at most other examples of intellectual property, some throwing it out all together. I remember reading Google vs Oracle where Oracle sued Google for stealing code to perform a range check, about about 9 lines long, used to check array index bounds.

I guess the difference is AI companies bad? This is transformative technology creating trillions in value and democratizing information, all subsidized by VC money. Why would anyone in open source who claims to have noble causes be against this? Because their repo will no longer get stars? Because no one will read their asinine stack overflow answer?

https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_....

show 1 reply
frizlabtoday at 1:12 PM

> It wasn't their "gift", it was unwillingly taken from them.

Yes. Exactly. As a developer in that case I feel almost violated in my trust in “the internet.” Well it’s even worse, I did not really trust it, but did not think it could be that bad.