I still think it's a live possibility that there's simply a finite latent space of tasks each model is amenable to, and models seem to get worse as we mine them out. (The source link claims this is associated with "the rollout of thinking content redaction", but also that observable symptoms began before that rollout, so I wouldn't particularly trust its diagnosis even without the LLM psychosis bit at the end.)