Also | alt Hacker News

scrollop • yesterday at 8:00 PM • 1 reply • view on HN

Also

https://artificialanalysis.ai/evaluations/omniscience

Prepare to be amazed

Replies

I’m amazed by how much Gemini 3 flash hallucinates; it performs poorly in that metric (along with lots of other models). In the Hallucination Rate vs. AA-Omniscience Index chart, it’s not in the most desirable quadrant; GPT-5.1 (high), opus 4.5 and 4.5 haiku are.

Can someone explain how Gemini 3 pro/flash then do so well then in the overall Omniscience: Knowledge and Hallucination Benchmark?

➕ show 2 replies