logoalt Hacker News

yard2010last Sunday at 9:30 AM2 repliesview on HN

I thought about it - a quick way to verify whether something was created with LLM is to feed an LLM half of the text and then let it complete token by token. Every completion, check not just for the next token but the next n-probable tokens. If one of them is the one you have in the text, pick it and continue. This way, I think, you can identify how much the model is "correct" by predicting the text it hasn't yet seen.

I didn't test it and I'm far from an expert, maybe someone can challenge it?


Replies

jampekkalast Sunday at 10:04 AM

That seems somewhat similar to perplexity based detection, although you can just get the probabilities of each token instead of picking n-best, and you don't have to generate.

It kinda works, but is not very reliable and is quite sensitive to which model the text was generated with.

This page has nice explanations:

https://www.pangram.com/blog/why-perplexity-and-burstiness-f...

akoboldfryinglast Sunday at 11:16 AM

I expect that, for values of n for which this test consistently reports "LLM-generated" on LLM-generated inputs, it will also consistently report "LLM-generated" on human-generated inputs. But I haven't done the test either so I could be wrong.