ChatGPT 3.5 was almost 40 months ago, not 24. GPT 4.5 was supposed to be 5 but was not noticeably better than 4o. GPT 5 was a flop. Remember the hype around Gemini 3? What happened to that? Go back and read the blog posts from November when Opus 4.5 came out; even the biggest boosters weren't hyping it up as much as they are now.
It's pretty obvious the change of pace is slowing down and there isn't a lot of evidence that shipping a better harness and post-training on using said harness is going to get us to the magical place where all SWE is automated that all these CEOs have promised.
Fair enough, I guess I'm misremembering the timeline, but saying "It's taken 3 years, not 2!" doesn't really change the point I'm making very much. The road from what ChatGPT 3.5 could do to what Codex 5.3 can do represents an amazing pace of change.
I am not claiming it's perfect, or even particularly good at some tasks (pelicans on bicycles for example), but anyone claiming it isn't a mind-blowing achievement in a staggeringly short time is just kidding themselves. It is.
Wait, you're completely skipping the emergence of reasoning models, though? 4.5 was slower and moderately better than 4o, o3 was dramatically stronger than 4o and GPT5 was basically a light iteration on that.
What's happening now is training models for long-running tasks that use tools, taking hours at a time. The latest models like 4.6 and 5.3 are starting to make good on this. If you're not using models that are wired into tools and allowed to iterate for a while, then you're not getting to see the current frontier of abilities.
(EG if you're just using models to do general knowledge Q&A, then sure, there's only so much better you can get at that and models tapered off there long ago. But the vision is to use agents to perform a substantial fraction of white-collar work, there are well-defined research programmes to get there, and there is stead progress.)