Measurement and alerting is usually done in business metrics, not the causes. That way you catch cl...

jpollock • yesterday at 9:40 PM • 0 replies • view on HN

Measurement and alerting is usually done in business metrics, not the causes. That way you catch classes of problems.

Not sure about expected loss, that's a decay rate?

But stuck jobs are via tasks being processed and average latency.

alt Hacker News