> The training data If the prompt is unique, it is not in the training data. True for basically...

throw310822 • yesterday at 4:32 PM • 3 replies • view on HN

> The training data

If the prompt is unique, it is not in the training data. True for basically every prompt. So how is this probability calculated?

Replies

cbovis • yesterday at 4:55 PM

The prompt is unique but the tokens aren't.

Type "owejdpowejdojweodmwepiodnoiwendoinw welidn owindoiwendo nwoeidnweoind oiwnedoin" into ChatGPT and the response is "The text you sent appears to be random or corrupted and doesn’t form a clear question." because the prompt doesnt correlate to training data.

➕ show 2 replies

qsera • yesterday at 4:51 PM

Just using a scaled up and cleverly tweaked version of linear regression analysis...

➕ show 1 reply

hmmmmmmmmmmmmmm • yesterday at 5:33 PM

Hamiltonian paths and previous work by Donald Knuth is more than likely in the training data.

➕ show 1 reply

alt Hacker News

Replies