I think what they're actually struggling with is costs. And I think they're all behind the scenes quantizing models to manage load here and there, and they're all giving inconsistent results.
I noticed huge improvement from Sonnet 4.5 to Opus 4.5 when it became unthrottled a couple weeks ago. I wasn't going to sign back up with Anthropic but I did. But two weeks in it's already starting to seem to be inconsistent. And when I go back to Sonnet it feels like they did something to lobotomize it.
Meanwhile I can fire up DeepSeek 3.2 or GLM 4.6 for a fraction of the cost and get almost as good as results.