logoalt Hacker News

satvikpendemtoday at 5:48 AM3 repliesview on HN

> 2. GPL does not allow you to take the code, compress it in your latent space, and then sell that to consumers without open sourcing your code.

If AI training is found to be fair use, then that fact supercedes any license language.


Replies

AnthonyMousetoday at 6:40 AM

Whether AI training in general is fair use and whether an AI that spits out a verbatim copy of something from the training data has produced an infringing copy are two different questions.

If there is some copyrighted art in the background in a scene from a movie, maybe that's fair use. If you take a high resolution copy of the movie, extract only the art from the background and want to start distributing that on its own, what do you expect then?

mountainbtoday at 11:17 AM

Fair use is a case by case fact question dependent on many factors. Trial judges often get creative in how they apply these. The courts are not likely to apply a categorical approach to it like that despite what some professors have written.

iso1631today at 12:19 PM

Training seems fine. I learn how to write something by looking at example code, then write my own program, that's widely accepted to be a fair use of the code. Same if I learn multiple things from reading encyclopedias, then write an essay, that's good.

However if I memorise that code and write it down that's not fair use. If I copy the encyclopedia that's bad.

The problem then comes into "how trivial can a line be before it's copyrighted"

    def main():
      print("This is copyrighted")
    main()
This is a problem in general, not just in written words. See the recent Ed Sheeran case - https://www.bbc.co.uk/news/articles/cgmw7zlvl4eo