The most annoying thing in the LLM space is that people write articles and research with grand pronouncements based upon old models. This article has no mention of Sonnet 4.5, nor does it use any of the actual OpenAI coding models (GPT-5-Codex, GPT-5.1 Codex, etc), and based upon that, even the Opus data is likely an older version.
This then leads to a million posts where on one side people say "yeah see they're crap" and on the other side people are saying "why did you use a model from 6 months ago for your 'test' and write up in Jan 2026?".
You might as well ignore all of the articles and pronouncements and stick to your own lived experience.
The change in quality between 2024 and 2025 is gigantic. The change between early 2025 and late 2025 is _even_ larger.
The newer models DO let you know when something is impossible or unlikely to solve your problem.
Ultimately, they are designed to obey. If you authoritatively request bad design, they're going to write bad code.
I don't think this is a "you're holding it wrong" argument. I think it's "you're complaining about iOS 6 and we're on iOS 12.".