logoalt Hacker News

bayindirhtoday at 5:55 PM1 replyview on HN

I have written about this numerous times, so I won't repeat myself with the long form writing. Maybe I need to keep a list of comments somewhere, so I can reference them. I digress...

In short:

- GPL code requires attribution and sharing of code. Models strip license, so GPL is effectively violated.

- Source available licenses are "for your eyes" only, so training on source available code is also violates said code's licenses.

- MIT requires attribution, but forgetting it has no consequences, so it's a more gray area.

About moving from GitHub:

- Some public repositories provide visible and invisible anti-scraping protections. So it's not always that easy.

- GPL says I need to share code to the people who downloads the application itself, so I can move to cathedral model.

Moreover:

- US Government has a stance of "If we need to take permission for everything, AI industry will die". Hence, as an outsider, the court rulings have no weight in my eyes. They are taking stance to enable and not hinder the industry. If one reads Fair Use doctrine, it's very possible to rule otherwise. OpenAI's whole non-profit research arm was an instrument to circumvent Fair Use doctrine's "earn money from copyrighted works" clause and support "we only do research pinky promise" requirement of the said doctrine.

When courts said "go ahead, we're not looking", people started to torrent e-books (ahem Meta ahem) to train models or buy/cut/scan/ocr books to train their models (Anthropic).

So the situation is left murky to allow Silicon Valley to thrive. Not to protect people's blood, sweat and tears. These works are provided by peasants anyway, so why bother.

Addenda: Courts said models' outputs can't be copyrighted. So, copyrighted code gets in, non-copyrightable code gets out. It's effectively license-washing.


Replies

rpdillontoday at 7:48 PM

I don't think your understanding of Fair Use matches mine, but it is important, since it invalidates the concern about licensing.

I wrote a nearby comment giving some resources on the current state of Fair Use for training, but in short: it depends.

https://news.ycombinator.com/item?id=48125071

> Hence, as an outsider, the court rulings have no weight in my eyes.

My only focus in on legality, so this doesn't track for me. If we're not talking about what courts are ruling, then there's nothing to talk about legally, since the copyright office is waiting on courts to rule here.