logoalt Hacker News

uyzstvqstoday at 3:36 PM2 repliesview on HN

Training is not redistribution. It's the exact same as you as a person learning to program from proprietary secret code, and then writing your own original code independently. Even if you repeat patterns and methods you've picked up from that proprietary learning material, it is by no means redistribution. The practical differentiator here is that you do not access the proprietary material during the creation of your own original work, similar in principle to a clean-room design. With AI/ML, it matters that training data is not accessed during inference, which it's not.

The other factor of copyright, which is relevant, is how material is obtained. If the material is publicly accessible without protection, you have no reasonable expectation to exclusive control over its use. If you don't want AI training to be done on your work, you need to put access to it behind explicit authentication with a legally-binding user agreement prohibiting that use-case. Do note that this would lose your project's status as open-source.


Replies

ndiddytoday at 3:57 PM

> Training is not redistribution. It's the exact same as you as a person learning to program from proprietary secret code, and then writing your own original code independently.

Well the difference is that copyright law applies to work fixed in a tangible medium of expression. This covers i.e. model weights on a hard drive but not the human brain. If the model is able to reproduce others’ work verbatim (like the example the article brings up of the song lyrics) then under copyright law that’s unauthorized reproduction. It doesn’t matter that the data is expressed via probabilistic weights because due to past lobbying/lawsuits by the software industry to get compiled binary code covered by copyright, reproduction can include copies that aren’t directly human readable.

> If the material is publicly accessible without protection, you have no reasonable expectation to exclusive control over its use.

There’s over 20 years of successful GPL infringement lawsuits over unlicensed use of publicly available GPL code that disagrees with this point.

luqtastoday at 3:43 PM

so basically we download the sources files to the training weight and remove the LICENSE.MD as it's exactly the same as a person learning to program from proprietay secret code and outputing code based on that for millions of peoples in matter of seconds /s

we also treat as however we want public goods found over the internet. as the World Intellectual Property Organization Copyright Treaty and Berne Convention for the Protection of Literary and Artistic Works aren't real or because we can as we are operating in international waters, selling products for other sails living exclusively in international waters /s

show 1 reply