> doesn't really justify the decision to use the same deployment system responsible for the 11/18 outage
There’s no other deployment system available. There’s a single system for config deployment and it’s all that was available as they haven’t yet done the progressive roll out implementation yet.
Ok. Sure But shouldn't they have some beta/staging/test area they could deploy to, run tests for an hour then do the global blast?
> There’s no other deployment system available.
Hindsight is always 20/20, but I don't know how that sort of oversight could happen in an organization whose business model rides on reliability. Small shops understand the importance of safeguards such as progressive deployments or one-box-style deployments with a baking period, so why not the likes of Cloudflare? Don't they have anyone on their payroll who warns about the risks of global deployments without safeguards?