logoalt Hacker News

scrollopyesterday at 8:00 PM1 replyview on HN

Also

https://artificialanalysis.ai/evaluations/omniscience

Prepare to be amazed


Replies

albumenyesterday at 9:57 PM

I’m amazed by how much Gemini 3 flash hallucinates; it performs poorly in that metric (along with lots of other models). In the Hallucination Rate vs. AA-Omniscience Index chart, it’s not in the most desirable quadrant; GPT-5.1 (high), opus 4.5 and 4.5 haiku are.

Can someone explain how Gemini 3 pro/flash then do so well then in the overall Omniscience: Knowledge and Hallucination Benchmark?

show 2 replies