Even with the 1m context window, it looks like these models drop off significantly at about 256k. Hopefully improving that is a high priority for 2026.
$30/M Input and $180/M Output Tokens is nuts. Ridiculous expensive for not that great bump on intelligence when compared to other models.
No doubt this was released early to ease the bad press
Quick: let's release something new that gives the appearance that we're still relevant
Honestly at this point I just want to know if it follows complex instructions better than 5.1. The benchmark numbers stopped meaning much to me a while ago - real usage always feels different.
Remember when everyone was predicting that GPT-5 would take over the planet?
Not a single comparison between 5.4 and Gemini or Claude. OpenAI continues to fall further behind.
How much of LLM improvement comes from regular ChatGPT usage these days?
Benchmarks barely improved it seems
Is it any good at coding?
Now with more and improved domestic espionage capabilities
Does this model autonomously kill people without human approval or perform domestic surveillance of US citizens?
Anyone else getting artifacts when using this model in Cursor?
numerusformassistant to=functions.ReadFile մեկնաբանություն 天天爱彩票网站json {"path":
Is it just me or the price for 5.4 pro is just insane?
What is with the absurdity of skipping "5.3 Thinking"?
We'll have to wait a day or two, maybe a week or two, to determine if this is more capable in coding than 5.3, which seems to be the economically valuable capability at this time.
In terms of writing and research even Gemini, with a good prompt, is close to useable. That's likely not a differentiator.
Everyone is mindblown in 3...2...1
I wouldn't trust any of these benchmarks unless they are accompanied by some sort of proof other than "trust me bro". Also not including the parameters the models were run at (especially the other models) makes it hard to form fair comparisons. They need to publish, at minimum, the code and runner used to complete the benchmarks and logs.
Not including the Chinese models is also obviously done to make it appear like they aren't as cooked as they really are.
More discussion here on the blog post announcement which has been confusingly penalized by Hacker News's algorithm: https://news.ycombinator.com/item?id=47265005
[dead]
[dead]
[flagged]
[dead]
[dead]
[flagged]
some sloppy improvements
Wow insane improvements in targeting systems for military targets over children
I was just testing this with my unity automation tool and the performance uplift from 5.2 seems to be substantial.