logoalt Hacker News

smusamashahtoday at 5:43 AM1 replyview on HN

This could be the reason https://petergpt.github.io/bullshit-benchmark/viewer/index.v... Claude bullshits the least of all models. ChatGPT does it more than half the time.


Replies

sunaookamitoday at 9:02 AM

That's a nice benchmark + website and wow ChatGPT scores worse than I thought.