Relicensing with AI-Assisted Rewrite

373 points • by tuananh • yesterday at 5:07 AM • 359 comments • view on HN

Comments

b65e8bee43c2ed0 • yesterday at 11:20 AM

at this point, every corporation in the world has AI slop in their software. any attempt to outlaw it would attract enough funding from the oligarchs for the opposition to dethrone any party. no attempts will be made in the next three years, obviously, and then it will be even more late than it is now.

and while particularly diehard believers in democracy may insist that if they kvetch hard enough they can get things they don't like regulated out of existence, they pointedly ignore the elephant in the room. they could succeed beyond their wildest dreams - get the West to implement a moratorium on AI, dismantle every FAGMAN, Mossad every researcher, send Yudkowskyjugend death squads to knock down doors to seize fully semiautomatic assault GPUs, and none of it will make any fucking difference, because China doesn't give a fuck.

tokai • yesterday at 2:56 PM

"I am not a lawyer, nor am I an expert in copyright law or software licensing."

Why would anyone waste their time reading what they wrote then?

verdverm • yesterday at 5:25 AM

Interesting questions raised by recent SCOTUS refusal to hear appeals related to AI an copyright-ability, and how that may affect licensing in open source.

Hoping the HN community can bring more color to this, there are some members who know about these subjects.

MagicMoonlight • yesterday at 1:58 PM

Logically, feeding in the old code to generate the new would be banned, because it’s stealing the content.

But if that were true, every single LLM is illegal, because they’ve all stolen terabytes of books and code.

andrewstuart • yesterday at 12:36 PM

Ai rewrites great.

But if it’s making the original author unhappy then why do it.

est • yesterday at 6:51 AM

Uh, patricide?

The key leap from gpt3 to gpt-3.5 (aka ChatGPT) was code-davinci-002, which is trained upon Github source code after OpenAI-Microsoft partnership.

Open source code contributed much to LLM's amazing CoT consistency. If there's no Open Source movement, LLM would be developed much later.

RcouF1uZ4gsC • yesterday at 11:02 AM

> The copyright vacuum: If AI-generated code cannot be copyrighted (as the courts suggest), then the maintainers may not even have the legal standing to license v7.0.0 under MIT or any license.

I believe this is a misunderstanding of the ruling. The code can’t be copyrighted by a LLM. However, the code could be copyrighted by the person running the LLM.

jacquesm • yesterday at 11:44 AM

If you don't understand the meaning of what a 'derived work' is then you should probably not be doing this kind of thing without a massive disclaimer and/or having your lawyer doing a review.

There is no such thing as the output of an LLM as a 'new' work for copyright purposes, if it were then it would be copyrightable and it is not. The term of art is 'original work' instead of 'new'.

The bigger issue will be using tools such as these and then humans passing off the results as their own because they believe that their contribution to the process whitewashes the AI contributions to the point that they rise to the status of original works. "The AI only did little bits" is not a very strong defense though.

If you really want to own the work-product simply don't use AI during the creation. You can use it for reviews, but even then you simply do not copy-and-paste from the AI window to the text you are creating (whether code or ordinary prose isn't really a difference).

I've seen a copyright case hinge on 10 lines of unique code that were enough of a fingerprint to clinch the 'derived work' assessment. Prize quote by the defendant: "We stole it, but not from them".

There is a very blurry line somewhere in the contents of any large LLM: would a model be able to spit out the code that it did if it did not have access to similar samples and to what degree does that output rely on one or more key examples without which it would not be able to solve the problem you've tasked it with?

The lower boundary would be the most minimal training set required to do the job, and then to analyze what the key corresponding bits were from the inputs that cause the output to be non-functional if they were dropped from the training set.

The upper boundary would be where completely non-related works and general information rather than other parties copyrighted works would be sufficient to do the creation.

The easiest way to loophole this is to copyright the prompt, not the work product of the AI, after all you should at least be able to write the prompt. Then others can re-create it too, but that's usually not the case with these AI products, they're made to be exact copies of something that already exists and the prompt will usually reflect that.

That's why I'm a big fan of mandatory disclosure of whether or not AI was used in the production of some piece of text, for one it helps to establish whether or not you should trust it, who is responsible for it and whether the person publishing it has the right to claim authorship.

Using AI as a 'copyright laundromat' is not going to end up well.

oytis • yesterday at 12:15 PM

Is it just me, or HN recently started picking up a social media dynamics with contributions reacting/responding to each other?

➕ show 2 replies

spwa4 • yesterday at 7:39 AM

Can we do the same with universal music? Because that's easy and already possible. Or Microsoft Windows? Because we all know the answer: if it works, essentially any government will immediately call it illegal.

Because if this isn't allowed, that makes all of the AI models themselves illegal. They are very much the product of using others' copyrighted stuff and rewriting it.

But of course this will be allowed because copyright was never meant to protect anyone small. And that it's in direct contradiction with what applies to large companies? Courts won't care.

➕ show 1 reply

himata4113 • yesterday at 6:24 AM

I mean in my opinion GPL licensed code should just infect models forcing them to follow the license.

You can do this a lot by saying things like: complete the code "<snippet from gpl licensed code>".

And if now the models are GPL licensed the problem of relicensing is gone since the code produced by these models should in theory be also GPL licensed.

Unfortunately, there is a dumb clause that computer generated code cannot be copyrighted or licensed to begin with.

➕ show 1 reply

Cantinflas • yesterday at 9:40 AM

"If “AI-rewriting” is accepted as a valid way to change licenses, it represents the end of Copyleft. "

Software in the AI era is not that important.

Copyleft has already won, you can have new code in 40 seconds for $0.70 worth of tokens.

➕ show 2 replies

alt Hacker News

Relicensing with AI-Assisted Rewrite

Comments