This is oft-repeated but never backed up by evidence. Can you share the snippet that was plagiarized...

thedevilslawyer • today at 7:32 AM • 3 replies • view on HN

This is oft-repeated but never backed up by evidence. Can you share the snippet that was plagiarized?

Replies

I can't offer an example of code, but considering researchers were able to cause models to reproduce literary works verbatim, it seems unlikely that a git repository would be materially different.

https://www.theatlantic.com/technology/2026/01/ai-memorizati...

➕ show 1 reply

bayindirh • today at 8:16 AM

While this is from 2022, here you go:

https://x.com/docsparse/status/1581461734665367554

I'm sure if someone prompts correctly, they can do the same thing today. LLMs can't generate something they don't know.

➕ show 2 replies

IX-103 • today at 9:12 AM

It happens often enough that the company I work for has set up a presubmit to check all of the AI generated and AI assisted code for plagiarism (which they call "recitation"). I know they're checking the code for similarity to anything on GitHub, but they could also be checking against the model's their training corpus.

alt Hacker News

Replies