logoalt Hacker News

thedevilslawyertoday at 7:32 AM3 repliesview on HN

This is oft-repeated but never backed up by evidence. Can you share the snippet that was plagiarized?


Replies

vohktoday at 8:10 AM

I can't offer an example of code, but considering researchers were able to cause models to reproduce literary works verbatim, it seems unlikely that a git repository would be materially different.

https://www.theatlantic.com/technology/2026/01/ai-memorizati...

show 1 reply
bayindirhtoday at 8:16 AM

While this is from 2022, here you go:

https://x.com/docsparse/status/1581461734665367554

I'm sure if someone prompts correctly, they can do the same thing today. LLMs can't generate something they don't know.

show 2 replies
IX-103today at 9:12 AM

It happens often enough that the company I work for has set up a presubmit to check all of the AI generated and AI assisted code for plagiarism (which they call "recitation"). I know they're checking the code for similarity to anything on GitHub, but they could also be checking against the model's their training corpus.