This is almost certainly a software issue, though. Even if it's due to scaling, they still built a system that failed catastrophically rather than degrading gracefully.
> failed catastrophically rather than degrading gracefully
You mean like returning 529s and operating with reduced QoS?
Right. If this were truly a pure scaling issue, I’d expect the interface would offer an archive.is-esque “Claude is at capacity; your prompt is #XXX/YYY in the queue; estimated time remaining: ZZZ seconds”
Instead, the whole system just shits the bed, catastrophically.
Sure. But could it be k8s config? Could it be Nvidia Bright Cluster? Could it be load balancing?
I'm not saying Anthropic isn't to blame for a system that is literally approaching one-nine uptime; they certainly are. I am saying that jumping to the "it must be vibe coding's fault" is an emotional confirmation-bias belief, not an evidence-based belief.