logoalt Hacker News

ImPostingOnHNlast Saturday at 5:07 PM2 repliesview on HN

That is a lagging indicator. By the time you're alerted, you've already failed by letting users experience an issue.


Replies

raldilast Saturday at 6:16 PM

What alternative would you propose? Page the oncall whenever there's a single query timeout?

show 1 reply
danarislast Saturday at 7:13 PM

Well, yes. If the cable falls out of the server (or there's a power outage, or a major DDoS attack, or whatever), your users are going to experience that before you are aware of it. Especially if it's in the middle of the night and you don't have an active night shift.

Expecting arbitrary services to be able to deal with absolutely any kind of failure in such a way that users never notice is deeply unrealistic.

show 1 reply