https://lmarena.ai&... | alt Hacker News

shaftway • last Wednesday at 6:50 PM • 2 replies • view on HN

https://lmarena.ai/leaderboard allows you to do your own blind A/B testing, but they aggregate user choices.

Looks like Google is in first place in the vast majority of the metrics, not far behind in the rest, and ahead of OpenAI in every category.

Replies

credit_guy • last Wednesday at 7:53 PM

It's a bit unfair. ChatGPT has a version that is so expensive that it appears nobody on that leaderboard used it [1]. It is called ChatGPT 5 Pro and its priced at $120/1M tokens. Claude Opus 4.5 has a price of $25/1M tokens [2]. Gemini 3 Pro is $18/1M tokens (assuming more than 200k tokens) and Sonnet 4.5 is $22.5/1M tokens (same assumption). I would expect that ChatGPT 5 Pro would be better than any of these other models, but I have no way of testing.

The next most expensive OpenAI model is ChatGPT 5.1, which costs only $10/1M tokens, so significantly cheaper than all its competitors. It seems to me that's fair for this model to come on the 3rd or 4th place, given that.

[1] https://openai.com/api/pricing/

[2] https://www.claude.com/pricing#api

[3] https://ai.google.dev/gemini-api/docs/pricing

solumunus • yesterday at 7:57 AM

Having marginally better models is not winning the race. Their models are good but their products are bad, or at least not the best. They aren’t winning in adoption and this is currently a market share battle.

I can argue that Firefox is marginally better than Chrome but that doesn’t mean Firefox is winning the race does it?