We need a hallucination benchmark. My experience is, o1 is very good at avoiding hallucinations an...

energy123 • 01/16/2025 • 1 reply • view on HN

We need a hallucination benchmark.

My experience is, o1 is very good at avoiding hallucinations and I trust it more, but o1-mini and 4o are awful.

Replies

sdesol • 01/16/2025

Well given the price $15.00 / 1M input tokens and $60.00 / 1M output* tokens, I would hope so. Given the price, I think it is fair to say it is doing a lot of checks in the background.

➕ show 1 reply

alt Hacker News

Replies