logoalt Hacker News

ndriscolllast Friday at 11:09 PM2 repliesview on HN

This is specifically a consumer model (or specifically ChatGPT) issue. e.g. IME codex does not do this, and will just tell you when you're missing something or somehow wrong, and Gemini does this weird thing where it tells you you're a genius and then immediately starts correcting everything you said.


Replies

solid_fuellast Friday at 11:54 PM

Sycophancy is just one aspect of the problems I mentioned, though. Another huge one is hallucination, and one that is actually far worse than I thought:

> It’s been proven that when a model is trained on large volumes of highly factual and non-theoretical data, it learns to always have an answer. DeepSeek V4 Pro (1.6T params, 49B active, 44 AA Intelligence Index score) has a ludicrous 94% hallucination score on the AA-Omniscience benchmark, meaning on questions that it couldn’t figure out, it only stated that it didn’t know around 6% of the time, and the rest it confidently hallucinated an answer. GLM-5.2 scored a 28% hallucination rate, Opus 4.8 was 36%, Fable 5 was 48%, and GPT-5.5 was 86%.

https://arrowtsx.dev/bigger-models/

I think even a 5% hallucination rate would be terrible for a teacher, who should generally be comfortable with saying "I don't know off the top of my head but here is how to find resources to answer your question".

---

So, just to drive the point home, Codex has an 86.9% hallucination rate on the AA-omniscience score in this index https://benchlm.ai/models/gpt-5-3-codex - if you ask it something that wasn't sufficiently covered in its training data, it will confidently make up an answer nearly 87% of the time.

While you might think it is happy to correct you when you are wrong, you don't know that for sure since you don't know when you're wrong. Codex may have been happily agreeing with you about things you had completely backwards.

show 1 reply
HKH2yesterday at 12:52 AM

> Gemini does this weird thing where it tells you you're a genius and then immediately starts correcting everything you said.

That's a great way to get you to listen because your guard is down. Imagine if it told you you were an idiot and then corrected you.