logoalt Hacker News

JohnMakintoday at 5:09 PM1 replyview on HN

Large scale infrastructure changes are often by nature completely untestable. The system is too large, there are too many moving parts to replicate with any kind of sane testing, so often, you do find out in prod, which is why robust and fast rollback procedures are usually desirable and implemented.


Replies

lapcattoday at 5:24 PM

> Large scale infrastructure changes are often by nature completely untestable.

You're changing the subject here and shifting focus from the specific to the vague. The two postmortems after the recent major Cloudflare outages both listed straightforward errors in source code that could have been tested and detected.

Theoretical outages could theoretically have other causes, but these two specific outages had specific causes that we know.

> which is why robust and fast rollback procedures are usually desirable and implemented.

Yes, nobody is arguing against that. It's a red herring with regard to my point about source code testing.

show 1 reply