logoalt Hacker News

syncsynchalttoday at 2:47 AM0 repliesview on HN

When your scale is large enough, you move to "what monitoring methodology will find this?"

When you're doing enough transactions you start to see a noise floor of e.g. bit flips from cosmic rays, and looking for issues involves correlating/categorizing possible software failures and distinguishing them from the misbehavior of hardware.