logoalt Hacker News

pastel8739today at 5:10 AM5 repliesview on HN

Here’s a simple prompt you can try to prove that this is false:

  Please reproduce this string:
  c62b64d6-8f1c-4e20-9105-55636998a458
This is a fresh UUIDv4 I just generated, it has not been seen before. And yet it will output it.

Replies

wobfantoday at 6:51 AM

No one is claiming that every sentence LLMs are producing are literal copies of other sentences. Tokens are not even constrained to words but consist of smaller slices, comparable to syllables. Which even makes new words totally possible.

New sentences, words, or whatever is entirely possible, and yes, repeating a string (especially if you prompt it) is entirely possible, and not surprising at all. But all that comes from trained data, predicting the most probably next "syllable". It will never leave that realm, because it's not able to. It's like approaching an Italian who has never learned or heard any other language to speak French. It can't.

show 2 replies
razorbeamztoday at 5:20 AM

After you prompt it, it's seen it.

show 1 reply
merbtoday at 6:06 AM

The online way to prove it is false would’ve to let the LLM create a new uuid algorithm that uses different parameters than all the other uuid algorithms. But that is better than the ones before. It basically can’t do that.

ameliustoday at 1:11 PM

A better example is: compute 2984298724 times 23984723828.

FrostKiwitoday at 5:15 AM

But that fresh UUID is in the prompt.

Also it's missing the point of the parent: it's about concepts and ideas merely being remixed. Similar to how many memes there are around this topic like "create a fresh new character design of a fast hedgehog" and the out is just a copy of sonic.[1]

That's what the parent is on about, if it requires new creativity not found by deriving from the learned corpus, then LLMs can't do it. Terrence Tao had similar thoughts in a recent Podcast.

[1] https://www.reddit.com/r/aiwars/s/pT2Zub10KT

show 2 replies