logoalt Hacker News

alienbabyyesterday at 10:28 PM5 repliesview on HN

Llm's do not verbatim disgorge chunks of the code they were trained on.


Replies

perryprogyesterday at 11:23 PM

I think it's probably less frequent nowadays, but it very much does happen. This still-active lawsuit[0] was made in response to LLMs generating verbatim chunks of code that they were trained on.[1]

[0] https://githubcopilotlitigation.com [1] https://www.theverge.com/2022/11/8/23446821/microsoft-openai...

AshamedCaptainyesterday at 11:46 PM

You can still very trivially get entire chunks of code from Copilot including even literal author names (simply by prodding with a doxygen tag).

neilvtoday at 12:04 AM

They do, and, early on, Microsoft (and perhaps others) put in some checks to try to hide that.

idle_zealotyesterday at 10:52 PM

Surely they do sometimes?

show 1 reply
bobsmoothtoday at 1:17 AM

ChatGPT has given me code with comments so specific I found the original 6 year old github.

show 1 reply