logoalt Hacker News

littlestymaartoday at 4:50 PM1 replyview on HN

> An mp3 file is also a machine-generated lossy compression of a cd-quality .wav file, but it's clearly copyrightable.

Not the .mp3 itself, the creative piece of art that it encode.

You can't record Taylor Swift at a concert and claim copyright on that. Nor can you claim copyright on mp3 re-encoded old audio footage that belong to the public domain.

Whether LLMs are in the first category (copyright infringement of copyright holders of the training data) or in the second (public domain or fair use) is an open question that jurisprudence is slowly resolving depending on the jurisdiction, but that doesn't address the question of the weight themselves.


Replies

mitthrowaway2today at 5:17 PM

Right, the .mp3 is machine generated but on a creatively -generated input. The analogy I'm making is that an LLM's weights (or let's say, a diffusion image model) are also machine-generated (by the training process) from the works in its training set, many of which are creative works, and the neural network encodes those creative works much like mp3 file does.

In this analogy, distributing the weights would be akin to distributing an mp3, and offering a genAI service, like charGPT inference or a stable diffusion API, would be akin to broadcasting.

show 1 reply