Actually - do they do this in LLM benchmarks? As a measure of overconfidence/confabulation? See...

andyferris • today at 9:34 AM • 0 replies • view on HN

Actually - do they do this in LLM benchmarks? As a measure of overconfidence/confabulation? Seems immediately applicable.

alt Hacker News