logoalt Hacker News

qarlyesterday at 10:11 PM1 replyview on HN

Well - maybe so. But the common belief is that training itself is a violation of copyright, no matter how it's done. That's the argument I'm countering here.


Replies

SahAssaryesterday at 10:29 PM

The issue is that the trainers have not sought licenses for the data and instead outright pirated it.

I don't think anyone thinks that all training is a copyright violation if all the training data is licensed. For example a LLM trained on CC0 content would be fine with basically everyone.

The problem is that training happens on data that is not licensed for that use. Some of that data also is pirated which makes it even clearer that it is illegal.

show 1 reply