logoalt Hacker News

throw310822yesterday at 4:32 PM3 repliesview on HN

> The training data

If the prompt is unique, it is not in the training data. True for basically every prompt. So how is this probability calculated?


Replies

cbovisyesterday at 4:55 PM

The prompt is unique but the tokens aren't.

Type "owejdpowejdojweodmwepiodnoiwendoinw welidn owindoiwendo nwoeidnweoind oiwnedoin" into ChatGPT and the response is "The text you sent appears to be random or corrupted and doesn’t form a clear question." because the prompt doesnt correlate to training data.

show 2 replies
qserayesterday at 4:51 PM

Just using a scaled up and cleverly tweaked version of linear regression analysis...

show 1 reply
hmmmmmmmmmmmmmmyesterday at 5:33 PM

Hamiltonian paths and previous work by Donald Knuth is more than likely in the training data.

show 1 reply