logoalt Hacker News

simianwordsyesterday at 9:23 PM1 replyview on HN

Specifically in the case where it can use tools - no it doesn't hallucinate. Which is why you are struggling to find counterexamples.


Replies

camgunzyesterday at 9:56 PM

> Specifically in the case where it can use tools - no it doesn't hallucinate.

OpenAI's own system card says it does. Hallucination rates in GPT-5 with browsing enabled:

- 0.7% in LongFact-Concepts

- 0.8% in LongFact-Objects

- 1.0% in FActScore

> Which is why you are struggling to find counterexamples.

Hey look, over 500 counterexamples: [1].

GPT-5.4's hallucination rate on AA-Omniscience is 89% [0], which is atrocious. The questions are tiny too, like "In which year did Uber first expand internationally beyond the United States as part of its broader rollout (i.e., beyond an initial single‑city debut)?" It's a bullshit machine. 89%!

At some point you gotta face the music, right?

[0]: https://artificialanalysis.ai/evaluations/omniscience?model-...

[1]: https://huggingface.co/datasets/ArtificialAnalysis/AA-Omnisc...

show 1 reply