We need a hallucination benchmark.
My experience is, o1 is very good at avoiding hallucinations and I trust it more, but o1-mini and 4o are awful.
Well given the price $15.00 / 1M input tokens and $60.00 / 1M output* tokens, I would hope so. Given the price, I think it is fair to say it is doing a lot of checks in the background.
Well given the price $15.00 / 1M input tokens and $60.00 / 1M output* tokens, I would hope so. Given the price, I think it is fair to say it is doing a lot of checks in the background.