logoalt Hacker News

woadwarrior01yesterday at 11:20 AM5 repliesview on HN

> Opus 4.5 is absolutely a state of the art model.

> See: https://artificialanalysis.ai

The field moves fast. Per artificialanalysis, Opus 4.5 is currently behind GPT-5.2 (x-high) and Gemini 3 Pro. Even Google's cheaper Gemini 3 Flash model seems to be slightly ahead of Opus 4.5.


Replies

MrOrelliOReillyyesterday at 2:26 PM

Totally, however OP's point was that Claude had to compensate for deficiencies versus a state of the art model like ChatGPT 5.2. I don't think that's correct. Whether or not Opus 4.5 is actually #1 on these benchmarks, it is clearly very competitive with the other top-tier models. I didn't take "state of the art" to here narrowly mean #1 on a given benchmark, but rather to mean near or at the frontier of current capabilities.

gesshayesterday at 3:30 PM

One thing to remember when comparing ML models of any kind is that single value metrics obscure a lot of nuance and you really have to go through the model results one by one to see how it performs. This is true for vision, NLP, and other modalities.

dr_dshivyesterday at 1:36 PM

https://lmarena.ai/leaderboard/webdev

LM Arena shows Claude Opus 4.5 on top

show 1 reply
ramozyesterday at 2:06 PM

https://x.com/giansegato/status/2002203155262812529/photo/1

https://x.com/METR_Evals/status/2002203627377574113

> Even Google's cheaper Gemini 3 Flash model seems to be slightly ahead of Opus 4.5.

What an insane take for anybody uses these models daily.

show 1 reply
fzzzyyesterday at 4:09 PM

is x-high fast enough to use as a coding agent?

show 1 reply