logoalt Hacker News

PaulKeebletoday at 2:43 PM1 replyview on HN

It did so in direct violation of the licenses of the code held there as well and then sold code snippets they had no rights to and still do.


Replies

rpdillontoday at 5:32 PM

How did you draw those conclusions? They don't seem to be in line with court rulings (i.e. Anthropic), which hold that training is fair use. Code is being treated the same as any other copyrighted content that is used for training, from blog posts to PR announcements from companies and everything in between. Of course the blog posts are PR announcements have their copyright held by their authors, with no license provided at all, so if OSS code being used in training is a violation, then so would everything being trained on (to a first approximation...public domain works excepted). But no court has every taken that position to my knowledge.

There's just so much confusion around this. In this thread alone:

* Distillation is legal under copyright; the violations would come as ToS violations, which is contract law, not copyright law.

* Training is legal as well, so long as the original material was obtained legally.

* Moving code off of GitHub doesn't change any of this: AI companies are free to download your git repo no matter where it is hosted, just like they can any other content on a publicly accessible website.

* Liability comes into the picture when the models are used to infringe copyright in their output. We'll have to see the outcome of the NYT case here, but that is proceeding at a glacial pace.

I am not a lawyer; I'm an interested amateur that's been following the saga for years. I wish the discussion here on HN were more nuanced.

If anyone has legal updates that render any of the above incorrect, I'd love a pointer to the decisions. One area I'm particularly weak is the legal status in countries that are not the US: I don't follow those laws nearly as carefully, nor the court cases brought.

show 1 reply