logoalt Hacker News

secondcomingyesterday at 8:33 PM1 replyview on HN

It becomes fun when you narrow down to the solution. Before that it's hell.

I don't think I'd be allowed spend weeks to debug something like this. Credit to Cloudflare's PMs.


Replies

maples37yesterday at 10:09 PM

Apparently they have a "unexplained crashes must have an explanation determined" policy ever since there was a trend of uninvestigated unexplained crashes that were canaries in the mine for a security issue.

https://blog.cloudflare.com/however-improbable-the-story-of-...

> But [the Cloudbleed sensitive information disclosure security incident] wasn’t the only consequence of the bug. Sometimes it could lead to an invalid memory read, causing the NGINX process to crash, and we had metrics showing these crashes in the weeks leading up to the discovery of Cloudbleed. So one of the measures we took to prevent such a problem happening again was to require that every crash be investigated in detail.

Since then, they have a "no crashes go uninvestigated" policy, which for the scale Cloudflare operates at, seems pretty impressive.

show 1 reply