Something like a perplexity/log-likelihood measurement across a large enough number of prompts&...

bthornbury • yesterday at 9:30 PM • 0 replies • view on HN

Something like a perplexity/log-likelihood measurement across a large enough number of prompts/tokens might get you the same in a statistical sense though. I expect those comparison percentages at the top are something like that.

alt Hacker News