I think there is a real issue here, but I do not think it is as simple as calling it theft in the same way as copying books. The bigger problem is incentives. We built a system where writing docs, tutorials, and open technical content paid off indirectly through traffic, subscriptions, or services. LLMs get a lot of value from that work, but they also break the loop that used to send value back to the people and companies who created it.
The Tailwind CSS situation is a good example. They built something genuinely useful, adoption exploded, and in the past that would have meant more traffic, more visibility, and more revenue. Now the usage still explodes, but the traffic disappears because people get answers directly from LLMs. The value is clearly there, but the money never reaches the source. That is less a moral problem and more an economic one.
Ideas like GPL-style licensing point at the right tension, but they are hard to apply after the fact. These models were built during a massive spending phase, financed by huge amounts of capital and debt, and they are not even profitable yet. Figuring out royalties on top of that, while the infrastructure is already in place and rolling out at scale, is extremely hard.
That is why this feels like a much bigger governance problem. We have a system that clearly creates value, but no longer distributes it in a sustainable way. I am not sure our policies or institutions are ready to catch up to that reality yet.
> I do not think it is as simple as calling it theft in the same way as copying books
Aside from the incentive problem, there is a kind of theft, known as conversion: when you were granted a license under some conditions, and you went beyond them - you kept the car past your rental date, etc. In this case, the documentation is for people to read; AI using it to answer questions is a kind of conversion (no, not fair use). But these license limits are mostly implicit in the assumption that (only) people are reading, or buried in unenforceable site terms of use. So it's a squishy kind of stealing after breaching a squishy kind of contract - too fuzzy to stop incented parties.
There will be no royalties, simply make all the models that trained on the public internet also be required to be public.
This won't help tailwind in this case, but it'll change the answer to "Should I publish this thing free online?" from "No, because a few AI companies are going to exclusively benefit from it" to "Yes, I want to contribute to the corpus of human knowledge."
It's not as simple as calling it theft, but it is simply theft, plus the other good points you made.
The problem is there was a social contract. Someone spent their time and money to create a product that they shared for free, provided you visit their site and see their offerings. In this way they could afford to keep making this free product that everyone benefited from.
LLMs broke that social contract. Now that product will likely go away.
People can twist themselves into knots about how LLMs create “value” and that makes all of this ok, but the truth is they stole information to generate a new product that generates revenue for themselves at the cost of other people’s work. This is literally theft. This is what copyright law is meant to protect. If LLM manufacturers are making money off someone’s work, they need to compensate people for that work, same as any client or customer.
LLMs are not doing this for the good of society. They themselves are making money off this. And I’m sure if someone comes along with LLM 2.0 and rips them off, they’re going to be screaming to governments and attorneys for protection.
The ironic part of all of this is that LLMs are literally killing the businesses they need to survive. When people stop visiting (and paying) Tailwind, Wikipedia, news sites, weather, and so on, and only use LLMs, those sites and services will die. Heck, there’s even good reason to think LLMs will kill the Internet at large, at least as an information source. Why in the hell would I publish news or a book or events on the Internet if it’s just going to be stolen and illegally republished through an LLM without compensating me for my work? Once this information goes away or is locked behind nothing but paywalls, I hope everyone is ready for the end of the free ride.
> We have a system that clearly creates value, but no longer distributes it in a sustainable way.
It does not "create value" it harvests value and redirects the proceeds it accrues towards its owners. The business model is a middleman that arbitrages the content by separating it from the delivery.
Software licensing has been broken for 2 decades. That's why free software isn't financially viable for anybody except a tiny minority. It should be. The entire industry has been operating by charity. The rich mega corporations have decided they're not longer going to be charitable.
> We have a system that clearly creates value, but no longer distributes it in a sustainable way
The same thing happened (and is still happening) with news media and aggregation/embedding like Google News or Facebook.
I don't know if anyone has found a working solution yet. There have been some laws passed and licensing deals [1]. But they don't really seem to be working out [2].
[1] https://www.cjr.org/the_media_today/canada_australia_platfor...
[2] https://www.abc.net.au/news/2025-04-02/media-bargaining-code...