well there is | alt Hacker News

throawayonthe • yesterday at 6:32 PM • 1 reply • view on HN

goldenarm • yesterday at 7:03 PM

It's a gibberish input detection benchmark, and does not measure output hallucinations.