logoalt Hacker News

energy123last Thursday at 10:18 PM1 replyview on HN

We need a hallucination benchmark.

My experience is, o1 is very good at avoiding hallucinations and I trust it more, but o1-mini and 4o are awful.


Replies

sdesollast Thursday at 11:35 PM

Well given the price $15.00 / 1M input tokens and $60.00 / 1M output* tokens, I would hope so. Given the price, I think it is fair to say it is doing a lot of checks in the background.

show 1 reply