That analysis is pretty brutal. It's very disconcerting that they can sell access to a high quality model then just stealthily degrade it over time, effectively pulling the rug from under their customers.
Disconcerting for sure, but from a business point of view you can understand where they're at; afaiui they're still losing money on basically every query and simultaneously under huge pressure to show that they can (a) deliver this product sustainably at (b) a price point that will be affordable to basically everyone (eg, similar market penetration to smartphones).
The constraints of (b) limit them from raising the price, so that means meeting (a) by making it worse, and maybe eventually doing a price discrimination play with premium tiers that are faster and smarter for 10x the cost. But anything done now that erodes the market's trust in their delivery makes that eventual premium tier a harder sell.
ChatGPT has been doing the same consistently for years. Model starts out smooth, takes a while, and produces good (relatively) results. Within a few weeks, responses start happening much more quickly, at a poorer quality.
It's disconcerting. But in 2026 it's not very surprising.
It seems likely to me they are moving compute power to the new models they are creating,
Seems like the logical conclusion, no matter what.
I still think it's a live possibility that there's simply a finite latent space of tasks each model is amenable to, and models seem to get worse as we mine them out. (The source link claims this is associated with "the rollout of thinking content redaction", but also that observable symptoms began before that rollout, so I wouldn't particularly trust its diagnosis even without the LLM psychosis bit at the end.)
> effectively pulling the rug from under their customers.
This is the whole point of AI. Its a black box that they can completely control.
[dead]
If you think that’s brutal, wait until you hear about how fiat currency works
Stealthily degrade the model or stealthily constrain the model with a tighter harness? These coding tools like Claude Code were created to overcome the shortcomings of last year's models. Models have gotten better but the harnesses have not been rebuilt from scratch to reflect improved planning and tool use inherent to newer models.
I do wonder how much all the engineering put into these coding tools may actually in some cases degrade coding performance relative to simpler instructions and terminal access. Not to mention that the monthly subscription pricing structure incentivizes building the harness to reduce token use. How much of that token efficiency is to the benefit of the user? Someone needs to be doing research comparing e.g. Claude Code vs generic code assist via API access with some minimal tooling and instructions.