logoalt Hacker News

Aurornisyesterday at 12:42 AM3 repliesview on HN

> that can reliably reproduce thousands/millions of copyrighted works, you shouldn't be distributibg it. If it were just regular software that had that capability, would it be allowed?

LLMs are hardly reliable ways to reproduce copyrighted works. The closest examples usually involve prompting the LLM with a significant portion of the copyrighted work and then seeing it can predict a number of tokens that follow. It’s a big stretch to say that they’re reliably reproducing copyrighted works any more than, say, a Google search producing a short excerpt of a document in the search results or a blog writer quoting a section of a book.

It’s also interesting to see the sudden anti-LLM takes that twist themselves into arguing against tools or platforms that might reproduce some copyrighted content. By this argument, should BitTorrent also be banned? If someone posts a section of copyrighted content to Hacker News as a comment, should YCombinator be held responsible?


Replies

zizeeyesterday at 10:24 AM

Then they should easily fall within the regulation section posted earlier.

If you cannot see the difference between BitTorrent and Ai models, then it's probably not worth engaging with you.

But Ai model have been shown to reproduce the training data

https://gizmodo.com/ai-art-generators-ai-copyright-stable-di...

https://arxiv.org/abs/2301.13188

Jenssonyesterday at 12:57 AM

> LLMs are hardly reliable ways to reproduce copyrighted works

Only because the companies are intentionally making it so. If they weren't trained to not reproduce copyrighted works they would be able to.

show 3 replies