Llm's do not verbatim disgorge chunks of the code they were trained on.

alienbaby • yesterday at 10:28 PM • 5 replies • view on HN

Replies

I think it's probably less frequent nowadays, but it very much does happen. This still-active lawsuit[0] was made in response to LLMs generating verbatim chunks of code that they were trained on.[1]

[0] https://githubcopilotlitigation.com [1] https://www.theverge.com/2022/11/8/23446821/microsoft-openai...

AshamedCaptain • yesterday at 11:46 PM

You can still very trivially get entire chunks of code from Copilot including even literal author names (simply by prodding with a doxygen tag).

neilv • today at 12:04 AM

They do, and, early on, Microsoft (and perhaps others) put in some checks to try to hide that.

idle_zealot • yesterday at 10:52 PM

Surely they do sometimes?

➕ show 1 reply

bobsmooth • today at 1:17 AM

ChatGPT has given me code with comments so specific I found the original 6 year old github.

➕ show 1 reply

alt Hacker News

Replies