There are many metrics for “better” and “worse”. It is entirely possible for an AI system to be better in the sense of hallucination while also being of less utility. An arrogant prick who’s always correct isn’t always a good person to have on your team, right?