Are they now? OpenAI's o3 was SOTA, and valued by its users for its high performance on hard ...

ACCount37 • today at 2:38 PM • 0 replies • view on HN

Are they now?

OpenAI's o3 was SOTA, and valued by its users for its high performance on hard tasks - while also being an absolute hallucination monster due to one of OpenAI's RLVR oopsies. You'd never know whether it's brilliant or completely full of shit at any given moment in time. People still used o3 because it was well worth it.

So clearly, hallucinations do not stop AI usage - or even necessarily undermine AI performance.

And if the bar you have to clear is "human performance", rather than something like "SQL database", then the bar isn't that high. See: the notorious unreliability of eyewitness testimonies.

Humans avoid hallucinations better than LLMs do - not because they're fundamentally superior, but because they get a lot of meta-knowledge "for free" as a part of their training process.

LLMs get very little meta-knowledge in pre-training, and little skill in using what they have. Doesn't mean you can't train them to be more reliable - there are pipelines for that already. It just makes it hard.

alt Hacker News