logoalt Hacker News

energy12301/16/20251 replyview on HN

We need a hallucination benchmark.

My experience is, o1 is very good at avoiding hallucinations and I trust it more, but o1-mini and 4o are awful.


Replies

sdesol01/16/2025

Well given the price $15.00 / 1M input tokens and $60.00 / 1M output* tokens, I would hope so. Given the price, I think it is fair to say it is doing a lot of checks in the background.

show 1 reply