logoalt Hacker News

Havocyesterday at 9:02 PM2 repliesview on HN

Cool project, but they really could have skipped the mention of clean room. Something trained on every copyrighted thing known to mankind is the opposite of clean room


Replies

cheema33yesterday at 10:06 PM

As others have pointed out, humans train on existing codebases as well. And then use that knowledge to build clean room implementations.

show 5 replies
benjiroyesterday at 9:50 PM

Hot take:

If you try to reimplement something in a clean room, its a step by step process, using your own accumulated knowledge as the basis. That knowledge that you hold in your brain, all too often is code that may have copyrights on it, from the companies you worked on.

Is it any different for a LLM?

The fact that the LLM is trained on more data, does not change that when you work for a company, leave it, take that accumulated knowledge to a different company, you are by definition taking that knowledge (that may be copyrighted) and implementing it somewhere else. It only a issue if you copy the code directly, or do the implementation as a 1:1 copy. LLMs do not make 1:1 copies of the original.

At what point is trained on copyrighted data, any different then a human trained on copyrighted data, that get reimplemented in a transformative way. The big difference is that the LLM can hold more data over more fields, vs a human, true... But if we look at specializations, this can come back to the same, no?

show 2 replies