It doesn't seem unreasonable. If you train a model that can reliably reproduce thousands/millions of copyrighted works, you shouldn't be distributibg it. If it were just regular software that had that capability, would it be allowed? Just because it's a fancy Ai model it is ok?
I have a Xerox machine that can reliably reproduce copyrighted works. Is that a problem, too?
Blaming tools for the actions of their users is stupid.
> that can reliably reproduce thousands/millions of copyrighted works, you shouldn't be distributibg it. If it were just regular software that had that capability, would it be allowed?
LLMs are hardly reliable ways to reproduce copyrighted works. The closest examples usually involve prompting the LLM with a significant portion of the copyrighted work and then seeing it can predict a number of tokens that follow. It’s a big stretch to say that they’re reliably reproducing copyrighted works any more than, say, a Google search producing a short excerpt of a document in the search results or a blog writer quoting a section of a book.
It’s also interesting to see the sudden anti-LLM takes that twist themselves into arguing against tools or platforms that might reproduce some copyrighted content. By this argument, should BitTorrent also be banned? If someone posts a section of copyrighted content to Hacker News as a comment, should YCombinator be held responsible?