logoalt Hacker News

silvertazayesterday at 7:57 PM3 repliesview on HN

Still huge hallucination rate, unfortunately at 86%. To compare, Opus sits at 36%.

Source: https://artificialanalysis.ai/models?omniscience=omniscience...


Replies

dubcanadayesterday at 8:42 PM

grok is 17%? And that's the lowest, most models are like 80%+?

While hallucination is probably closer to 100% depending on the question. This benchmark makes no sense.

show 1 reply
simianwordsyesterday at 8:23 PM

There's something off with this because Haiku should not be that good.

show 1 reply
dakolliyesterday at 8:30 PM

This indicates they want this behavior, they know the person asking the question probably doesn't understand the problem entirely (or why would they be asking), so they'd prefer a confident response, regardless of outcomes, because the point is to sell the technologies competency (and the perception thereof), not the capabilities, to a bunch of people that have no clue what they're talking about.

LLMs will ruin your product, have fun trusting a billionaires thinking machine they swear is capable of replacing your employees if you just pay them 75% of your labor budget.

show 2 replies