Still huge hallucination rate, unfortunately at 86%. To compare, Opus sits at 36%. Source:

silvertaza • yesterday at 7:57 PM • 3 replies • view on HN

Still huge hallucination rate, unfortunately at 86%. To compare, Opus sits at 36%.

Source: https://artificialanalysis.ai/models?omniscience=omniscience...

Replies

dubcanada • yesterday at 8:42 PM

grok is 17%? And that's the lowest, most models are like 80%+?

While hallucination is probably closer to 100% depending on the question. This benchmark makes no sense.

➕ show 1 reply

simianwords • yesterday at 8:23 PM

There's something off with this because Haiku should not be that good.

➕ show 1 reply

dakolli • yesterday at 8:30 PM

This indicates they want this behavior, they know the person asking the question probably doesn't understand the problem entirely (or why would they be asking), so they'd prefer a confident response, regardless of outcomes, because the point is to sell the technologies competency (and the perception thereof), not the capabilities, to a bunch of people that have no clue what they're talking about.

LLMs will ruin your product, have fun trusting a billionaires thinking machine they swear is capable of replacing your employees if you just pay them 75% of your labor budget.

➕ show 2 replies

alt Hacker News

Replies