logoalt Hacker News

yanis_tyesterday at 6:42 PM12 repliesview on HN

These releases are lacking something. Yes, they optimised for benchmarks, but it’s just not all that impressive anymore. It is time for a product, not for a marginally improved model.


Replies

ipsum2yesterday at 6:50 PM

The model was released less than an hour ago, and somehow you've been able to form such a strong opinion about it. Impressive!

show 6 replies
tgarrettyesterday at 7:46 PM

Plasma physicist here, I haven't tried 5.4 yet, but in general I am very impressed with the recent upgrades that started arriving in the fall of 2025: for tasks like manipulating analytic systems of equations, quickly developing new features for simulation codes, and interpreting and designing experiments (with pictures) they have become much stronger. I've been asking questions and probing them for several years now out of curiosity, and they suddenly have developed deep understanding (Gemini 2.5 <<< Gemini 3.1) and become very useful. I totally get the current SV vibes, and am becoming a lot more ambitious in my future plans.

show 1 reply
softwaredougyesterday at 6:55 PM

The products are the harnesses, and IMO that’s where the innovation happens. We’ve gotten better at helping get good, verifiable work from dumb LLMs

mindwokyesterday at 8:59 PM

They don't need to be impressive to be worthwhile. I like incremental improvements, they make a difference in the day to day work I do writing software with these.

iterateoftenyesterday at 7:02 PM

The product is putting the skills / harness behind the api instead of the agent locally on your computer and iterating on that between model updates. Close off the garden.

Not that I want it, just where I imagine it going.

Gigachadyesterday at 11:01 PM

They have a product now. Mass surveillance and fully automated killing machines.

wahnfriedenyesterday at 6:50 PM

5.3 codex was a huge leap over 5.2 for agentic work in practice. have you been using both of those or paying attention more to benchmark news and chatgpt experience?

esafakyesterday at 6:45 PM

That's for you to build; they provide the brains. Do you really want one company to build everything? There wouldn't be a software industry to speak of if that happened.

show 2 replies
varispeedyesterday at 7:32 PM

The scores increase and as new versions are released they feel more and more dumbed down.

jascha_engyesterday at 7:19 PM

When did they stop putting competitor models on the comparison table btw? And yeh I mean the benchmark improvements are meh. Context Window and lack of real memory is still an issue.

metalliqazyesterday at 7:12 PM

They need something that POPS:

    The new GPT -- SkyNet for _real_
throwaway613746yesterday at 8:04 PM

[dead]