logoalt Hacker News

anonym29yesterday at 6:17 PM1 replyview on HN

Per AA's Omniscience Index benchmark, the "non-hallucination rate" subcomponent (1 - hallucination rate) of 4% for DS4F vs 66% for M2.7.

https://artificialanalysis.ai/leaderboards/models?weights=op...


Replies

antirezyesterday at 6:59 PM

In the same page DS4F scores much better on Omniscent Accuracy. I would take those numbers with a bit of salt. For instance I ran different benchmarks against Qwen 3.6 27B and DS4F quantized at 2bit. DS4F hallucination rate is much lower. In general I find artificialanalysis benchmarks not very aligned with what I see in the field, but in this specific case I did many tests and it is even more so.