Per AA's Omniscience Index benchmark, the "non-hallucination rate" subcomponent (1 - ...

anonym29 • yesterday at 6:17 PM • 1 reply • view on HN

Per AA's Omniscience Index benchmark, the "non-hallucination rate" subcomponent (1 - hallucination rate) of 4% for DS4F vs 66% for M2.7.

https://artificialanalysis.ai/leaderboards/models?weights=op...

Replies

antirez • yesterday at 6:59 PM

In the same page DS4F scores much better on Omniscent Accuracy. I would take those numbers with a bit of salt. For instance I ran different benchmarks against Qwen 3.6 27B and DS4F quantized at 2bit. DS4F hallucination rate is much lower. In general I find artificialanalysis benchmarks not very aligned with what I see in the field, but in this specific case I did many tests and it is even more so.

alt Hacker News

Replies