logoalt Hacker News

blutoottoday at 12:55 AM1 replyview on HN

I would be in the skeptics' camp 3-4 months ago. Opus-4.5 and GPT-5.2 have changed my mind. I'm not talking about mere code completion. I am talking about these models AND the corresponding agents playing a really really capable software engineer + tester + SRE/Ops role.

The caveat is that we have to be fairly good at steering them in the right direction, as things stand today. It is exhaustive to do it the right way.


Replies

verdvermtoday at 1:52 AM

I agree the latest Gen of models, Opus 4.5 and Gemini 3 are more capable. 5.2 is OpenAI squeezing as much as they can out of 4 because they haven't had a successful pre training run since Ilya left

I disagree that they are really really capable engineers et al. They have moments where they shine like one. They also have moments where they perform worse than a new grad/hire. This is not what a really really capable engineer looks like. I don't see this fundamental changing, even with all the improvements we are seeing. It's lower level and more core than something adding more layers on top can resolve, that a only addresses best it can