logoalt Hacker News

ameliaquiningyesterday at 9:05 PM3 repliesview on HN

I read the post you're replying to as saying "this is copyright-encumbered and nonfree because it's a derivative work of everything in Claude's and GPT-5.5's training corpus", which is an argument I find fairly tiresome. (Realistically, if courts actually rule that this is the case, this tiny little project will be the least of anyone's concerns.)

"This is copyright-encumbered and nonfree because it's a derivative work of the legacy RAR binaries" is a different argument (and seems like it depends on details of the setup that were somewhat glossed over in the post).


Replies

Georgelementalyesterday at 11:57 PM

I also am skeptical of the "LLM output is derivative of everything in the training corpus" argument in general, but in this specific case I think it may have more merit. If the model was trained on unrar source code, and obtained specific information about the RAR format from that code which it then used in the code generation step, then the output is arguably tainted because of that.

show 1 reply
themafiayesterday at 9:38 PM

The point is, excepting current legal standards which are already very murky, how can _you_ claim copyright, if you don't _know_ it isn't encumbered?

You can get these LLMs to generate copyrighted outputs both intentionally and accidentally. This is a known fact; therefore, if you're not checking the output to see if this has occurred then you're potentially generating legal risks for yourself and anyone who uses your code.

To not only ignore this for your own use case but to then release the code under a proclaimed license seems legally problematic if not ethically concerning.

If you did get sued for infringement I can't imagine that your defense would be that you find the argument tiresome? Honestly, do you think this would never happen, or how would you go about defending your actions here?

show 1 reply