logoalt Hacker News

Calavartoday at 1:34 AM1 replyview on HN

Sure, maybe it's tricky to coerce an LLM into spitting out a near verbatim copy of prior data, but that's orthoginal to whether or not the data to create a near verbatim copy exists in the model weights.


Replies

D-Machinetoday at 2:31 AM

Especially since the recalls achieved in the paper are 96% (based on block largest-common substring approaches), the effort of extraction is utterly irrelevant.