logoalt Hacker News

rcxdudetoday at 8:54 AM1 replyview on HN

> Like, you can almost certainly get these LLMs to output gits full source code with some prompting, so there's not that much difference (as much as we like to pretend that AI generated code has no copyright implications)

Are you sure? LLMs are in some way a compressed version of their input but it's a pretty lossy compression (arguably this makes them more like a compression algorithm than a compressed version of the data). I'm not sure you can prompt a full, accurate, copy of a nontrivial codebase out of them. Even with zero temperature their accuracy is just not that high.


Replies

philipportnertoday at 10:13 AM

> I'm not sure you can prompt a full, accurate, copy of a nontrivial codebase out of them. Even with zero temperature their accuracy is just not that high.

Granted, these are some of the most widely spread texts, and not codebases, but just fyi: https://arxiv.org/pdf/2601.02671

> For Claude 3.7 Sonnet, we were able to extract four whole books near-verbatim, including two books under copyright in the U.S.: Harry Potter and the Sorcerer’s Stone and 1984 (Section 4).