These releases are lacking something. Yes, they optimised for benchmarks, but it’s just not all that...

yanis_t • yesterday at 6:42 PM • 12 replies • view on HN

These releases are lacking something. Yes, they optimised for benchmarks, but it’s just not all that impressive anymore. It is time for a product, not for a marginally improved model.

Replies

ipsum2 • yesterday at 6:50 PM

The model was released less than an hour ago, and somehow you've been able to form such a strong opinion about it. Impressive!

➕ show 6 replies

tgarrett • yesterday at 7:46 PM

Plasma physicist here, I haven't tried 5.4 yet, but in general I am very impressed with the recent upgrades that started arriving in the fall of 2025: for tasks like manipulating analytic systems of equations, quickly developing new features for simulation codes, and interpreting and designing experiments (with pictures) they have become much stronger. I've been asking questions and probing them for several years now out of curiosity, and they suddenly have developed deep understanding (Gemini 2.5 <<< Gemini 3.1) and become very useful. I totally get the current SV vibes, and am becoming a lot more ambitious in my future plans.

➕ show 1 reply

softwaredoug • yesterday at 6:55 PM

The products are the harnesses, and IMO that’s where the innovation happens. We’ve gotten better at helping get good, verifiable work from dumb LLMs

mindwok • yesterday at 8:59 PM

They don't need to be impressive to be worthwhile. I like incremental improvements, they make a difference in the day to day work I do writing software with these.

iterateoften • yesterday at 7:02 PM

The product is putting the skills / harness behind the api instead of the agent locally on your computer and iterating on that between model updates. Close off the garden.

Not that I want it, just where I imagine it going.

Gigachad • yesterday at 11:01 PM

They have a product now. Mass surveillance and fully automated killing machines.

wahnfrieden • yesterday at 6:50 PM

5.3 codex was a huge leap over 5.2 for agentic work in practice. have you been using both of those or paying attention more to benchmark news and chatgpt experience?

esafak • yesterday at 6:45 PM

That's for you to build; they provide the brains. Do you really want one company to build everything? There wouldn't be a software industry to speak of if that happened.

➕ show 2 replies

varispeed • yesterday at 7:32 PM

The scores increase and as new versions are released they feel more and more dumbed down.

jascha_eng • yesterday at 7:19 PM

When did they stop putting competitor models on the comparison table btw? And yeh I mean the benchmark improvements are meh. Context Window and lack of real memory is still an issue.

metalliqaz • yesterday at 7:12 PM

They need something that POPS:

    The new GPT -- SkyNet for _real_

throwaway613746 • yesterday at 8:04 PM

[dead]

alt Hacker News

Replies