forgive the skepticism, but this translates directly to "we asked the model pretty please not t...

TZubiri • today at 7:50 AM • 4 replies • view on HN

forgive the skepticism, but this translates directly to "we asked the model pretty please not to do it in the system prompt"

Replies

ComplexSystems • today at 4:37 PM

The model doesn't know what its training data is, nor does it know what sequences of tokens appeared verbatim in there, so this kind of thing doesn't work.

ffsm8 • today at 8:18 AM

It's mind boggling if you think about the fact they're essential "just" statistical models

It really contextualizes the old wisdom of Pythagoras that everything can be represented as numbers / math is the ultimate truth

➕ show 2 replies

mikaraento • today at 8:30 AM

That might be somewhat ungenerous unless you have more detail to provide.

I know that at least some LLM products explicitly check output for similarity to training data to prevent direct reproduction.

➕ show 1 reply

efskap • today at 8:34 AM

Would it really be infeasible to take a sample and do a search over an indexed training set? Maybe a bloom filter can be adapted

➕ show 1 reply

alt Hacker News

Replies