I only want to see how it performs on the Bullshit-benchmark

smusamashah • yesterday at 9:03 PM • 1 reply • view on HN

I only want to see how it performs on the Bullshit-benchmark https://petergpt.github.io/bullshit-benchmark/viewer/index.v...

GPT is not even close yo Claude in terms of responding to BS.

Replies

mistercow • today at 12:10 AM

My current hunch is that that benchmark captures most of the relevant gap between Anthropic and the rest. “Can’t distinguish truth from fiction” has long been one of the deeper complaints about LLMs, and the bullshit benchmark seems like a clever approach to testing at least some of that.

alt Hacker News

Replies