> As a programmer, I want to write more open source than ever, now.
I want to write less, just knowing that LLM models are going to be trained on my code is making me feel more strongly than ever that my open source contributions will simply be stolen.
Am I wrong to feel this? Is anyone else concerned about this? We've already seen some pretty strong evidence of this with Tailwind.
> I want to write less, just knowing that LLM models are going to be trained on my code is making me feel more strongly than ever that my open source contributions will simply be stolen. Am I wrong to feel this? Is anyone else concerned about this?
I don't think it's wrong, but misdirected maybe. What do you that someone can "steal" your open source contributions? I've always released most of my code as "open source", and not once has someone "stolen" it, it still sits on the same webpage where I initially published it, decades ago. Sure, it's guaranteed ingested into LLMs since long time ago, but that's hardly "stealing" when the thing is still there + given away for free.
I'm not sure how anyone can feel like their open source code was "stolen", wasn't the intention in the first place that anyone can use it for any purpose? That's at least why I release code as open source.
I don't understand the mindset because I began my foray into open source exactly because I wanted to distribute and share my code.
in other words, i've never been in the position that I felt my charitable givings anywhere were ever stolen.
Some people write code and put it out there without caveats. Some people jump into open source to be license warriors. Not me. I just write code and share it. If youre a person, great. if you're a machine then I suppose that's okay too -- I don't want to play musical chairs with licenses all day just to throw some code out there, and I don't particularly care if someone more clever than myself uses it to generate a profit.
I don't know if you're "wrong", but I do feel differently about this.
I've written a ton of open source code and I never cared what people do with it, both "good" or "bad". I only want my code to be "useful". Not just to the people I agree with, but to anyone who needs to use a computer.
Of course, I'd rather people use my code to feed the poor than build weapons, but it's just a preference. My conviction is that my code is _freed_ from me and my individual preferences and shared for everyone to use.
I don't think my code is "stolen", if someone uses it to make themselves rich.
Then why open source something in the first place? The entire point is to make it public, for anyone to use however is useful to him or her, and often to publicly collaborate on a project together.
If I made something open source, you can train your LLM on it as much as you want. I'm glad my open source work is useful to you.
I don't worry about that too much. I still contribute to FOSS projects, and I use FOSS projects. Whenever I contribute, I usually fix something that affects me (or maybe just something I encountered), and fixing it has a positive effect on the users of that software, including me.
I dont understand the invocation of tailwind here. It doesn't make sense. Tailwind's LLM struggles had nothing to do with open source, it had to do with the fact that they had the same business model as publisher, with ads pointing to their only product.
A common intention with opensource is to allow people, and AI tools they use, to reuse, recombine, etc. OSS code in any way they see fit. If that's not what you want, don't open source your work. It's not stealing if you gave it away and effectively told people "do whatever you want". Which is one way licenses such as the MIT license are often characterized.
It's very hard to prevent specific types of usage (like feeding code to an LLM) without throwing out the baby with the bathwater and also preventing all sorts of other valid usages. AGPLv3, which is what antirez and Redis use goes to far IMHO and still doesn't quite get the job done. It doesn't forbid people (or tools) to "look" at the code which is what AI training might be characterized as. That license creates lots of headaches for corporate legal departments. I switched to Valkey for that reason.
I actually prefer using MIT style licenses for my own contributions precisely because I don't want to constrain people or AI usage. Go for it. More power to you if you find my work useful. That's why I provide it for free. I think this is consistent with the original goals of open source developers. They wanted others to be able to use their stuff without having to worry about lawyers.
Anyway, AI progress won't stop because of any of this. As antirez says, that stuff is now part of our lives and it is a huge enabler if you are still interested in solving interesting problems. Which apparently he is. I can echo much of what he says. I've been able to solve larger and larger problems with AI tools. The last year has seen quite a bit of evolution in what is possible.
> Am I wrong to feel this?
I think your feelings are yours. But you might at least examine your own reasoning a bit more critically. Words like theft and stealing are big words. And I think your case for that is just very weak. And when you are coding yourself are you not standing on the shoulders of giants? Is that not theft?
> Am I wrong to feel this?
Why would a feeling be invalid? You have one life, you are under no obligation to produce clean training material, much less feel bad about this.
I think the Tailwind case is more complicated than this, but yes - I think it's reasonable to want to contribute something to the common good but fear that the value will disproportionally go to AI companies and shareholders.
Yes. If you didn't care before when contributing to open source who uses your code then it shouldn't matter now that a company picks up your code. You are also contributing this way too.
Tailwind is a business and they picked a business model that wasn't resilient enough.
I do open source exactly because i’m fine my work can be “stolen”.
This is a dilemma for me that gets more and more critical as I finalize my thesis. My default mental model was to open source for the sake of contributing back to the community, enhance my ideas and discuss them with whoever finds it interesting.
To my surprise, my doctoral advisor told me to keep the code closed. She told me not only LLMs will steal it and benefit from it, but there's a risk of my code becoming a target after it's stolen by companies with fat attorney budgets and there's no way I could defend and prove anything.
I'm convinced that LLMs results in all software needing to be open source (or at the very least source available).
In future everyone will expect to be able to customise an application, if the source is not available they will not chose your application as a base. It's that simple.
The future is highly customisable software, and that is best built on open source. How this looks from a business perspective I think we will have to find out, but it's going to be fun!
This is why I never got into open source in the first place. I was worried that new programmers might read my code, learn how to program, and then start independently contributing the the projects I know and love - significantly devaluing my contributions.
Unless I am missing something, it seems that you only need to use something like the following that was (obtained using quick search, haven't tried)
https://archclx.medium.com/enforcing-gpg-encryption-in-githu...
My opinion on the matter is that AI models stealing the open source code would be ok IF the models are also open and remain so, and the services like chatgpt will remain free of cost (at least a free tier), and remain free of ads.
But we all know how it is going to go.
Not wrong. But i don't share your concerns at all. I like sharing code and if people, and who knows, machines, can make use of it and provide some value however minute, that makes me content.
> But, in general, it is now clear that for most projects, writing the code yourself is no longer sensible, if not to have fun.
I want to write code to defy this logic and express my humanity. "To have fun", yes. But also to showcase what it means when a human engages in the act of programming. Writing code may increasingly not be "needed", but it increasingly is art.
This is an absolute valid concern. We either need strong governmental interventions to these models who don't comply with OSS.
Or accept that there definitely wont be open model businesses. Make them proprietary and accept the fact that even permissive licenses such as MIT, BSD Clause 2/3 wont't be followed by anyone while writing OSS.
And as for Tailwind, I donno if it is cos of AI.
With Tailwind, wasn't the problem that much fewer people visited the documentation, which showed ads? The LLMs still used Tailwind
Use a license that doesn't allow it then.
Not everything needs to be mit or gnu.
> Am I wrong to feel this?
There's no such thing as a wrong feeling.
And I say this as one of those with the view that AI training is "learning" rather than "stealing", or at least that this is the goal because AI is the dumbest, the most error prone, and also the most expensive way, to try to make a copy of something.
My fears about setting things loose for public consumption are more about how I will be judged for them than about being ripped off, which is kinda why that book I started writing a decade ago and have not meaningfully touched in the last 12 months is neither published properly nor sent to some online archive.
When it comes to licensing source code, I mostly choose MIT, because I don't care what anyone does with the code once it's out there.
But there's no such thing as a wrong feeling, anyone who dismisses your response is blinding themselves to a common human response that also led to various previous violent uprisings against the owners of expensive tools of automation that destroyed the careers of respectable workers.
I want to write less, because quite frankly I get zero satisfaction from having an LLM churn out code for me, in the same way that Vincent van Gogh would likely derive no joy from using Nano Banana to create a painting.
And sure, I could stubbornly refuse to use an LLM and write the code myself. But after getting used to LLM-assisted coding, particularly recent models, writing code by hand feels extremely tedious now.
If you don't want people "stealing" your code, you don't want open source. You want source available.
I've been writing a bunch of DSLs lately and I would love to have LLMs train on this data.
If you give, and expect something in return, then you are not giving, that is a transaction.
No, you're absolutely right.
LLMs are labor theft on an industrial scale.
I spent 10 years writing open source, I haven't touched it in the last 2. I wrote for multiple reasons none of which any longer apply:
- I believe every software project should have an open source alternative. But writing open source now means useful patterns can be extracted and incorporated into closed source versions _mechanically_ and with plausible deniability. It's ironically worse if you write useful comments.
- I enjoyed the community aspect of building something bigger than one person can accomplish. But LLMs are trained on the whole history and potentially forum posts / chat logs / emails which went into designing the SW too. With sufficiently advanced models, they effectively use my work to create a simulation of myself and other devs.
- I believe people (not just devs) should own the product they build (an even stronger protection of workers against exploitation than copyright). Now our past work is being used to replace us in the future without any compensation.
- I did it to get credit. Even though it was a small motivation compared to the rest, I enjoyed everyone knowing what I accomplished and I used it during job interviews. If somebody used my work, my name was attached to it. With LLMs, anyone can launder it and nobody knows how useful my work was.
- (not solely LLM related) I believed better technology improves the world and quality of life around me. Now I see it as a tool - neutral - to be used by anyone for both good and bad purposes.
Here's[0] a comment where I described why it's theft based on how LLMs work. I call it higher order plagiarism. I haven't seen this argument made by other people, it might be useful for arguing about those who want to legalize this.
In fact, I wonder if this argument has been made in court and whether the lawyers understand LLMs enough to make it.
> As a programmer, I want to write more open source than ever, now.
I believe open source will become a bit less relevant in it’s current form, as solution/project tailored libraries/frameworks can be generated in a few hours with LLMs.
I’ve written plenty of open source and I’m glad it’s going into the great training models that help everyone out.
I love AI and pay for four services and will never program without AI again.
It pleases me that my projects might be helping out.
Also open source without support has zero value. And you can support only 1-2 projects.
Meaning 99% of everything oss released now is de-facto abandonware.
Also why would I use your open source project, when I can just prompt the AI to generate one for me, gracefully stripping the license as a bonus?
[flagged]
You are not wrong to feel this, because you cannot control what you feel. But it might be worth investigating why you feel this, and why were you writing open source in the first place.
I feel similarly for a different reason. I put my code out there, licensed under the GPL. It is now, through a layer of indirection, being used to construct products that are not under the GPL. That's not what I signed up for.
I know the GPL didn't have a specific clause for AI, and the jury is still out on this specific case (how similar is it to a human doing the same thing?), but I like to imagine, had it been made today, there probably would be a clause covering this usage. Personally I think it's a violation of the spirit of the license.