logoalt Hacker News

gowldlast Thursday at 3:49 PM1 replyview on HN

As always, which model versions did you use in your test?


Replies

nomellast Thursday at 8:11 PM

Claude Opus 4.5, Gemini 3 Pro, ChatGPT 5.1. Haven't tried ChatGPT 5.2.

It requires that the discussion has nuance, to see the failure. Gemini is, by far the, worst at this (which fits my suspicion that they heavily weighted reddit posts).

I don't think this is all that strange though. The human, on one side of the argument, is also missing the nuance, which is the source of the conflict. Is there a belief that AI has surpassed the average human, with conversational nuance!?