> Assume that Datadog cuts the number of outages by half, by preventing them with early monito...

decimalenough • yesterday at 9:01 PM • 3 replies • view on HN

> Assume that Datadog cuts the number of outages by half, by preventing them with early monitoring. That would mean that without Datadog, we’d look at 24 hours’ worth of downtime, not 12. Let’s also assume that using Datadog results in mitigating outages 50% faster than without - thanks to being able to connect health metrics with logs, debug faster, pinpoint the root cause and mitigate faster. In that case, without Datadog, we could be looking at 36 hours worth of total downtime, versus the 12 hours with Datadog. To put it in numbers: the company would make around $9M in revenue it would otherwise lose, Now that $10M/year fee practically pays for itself!

Those are some pretty heroic assumptions. In particular, they assume the only options are Datadog or nothing, when there are far cheaper alternatives like the Prometheus/Grafana/Clickhouse stack mentioned in the article itself.

Replies

passivepinetree • yesterday at 9:37 PM

Another assumption that bothers me here is that the $9M in revenue would be completely lost during an outage. I imagine many customers would simply wait until the outage was resolved before performing their intended transactions, meaning far less than $9M would be lost.

➕ show 1 reply

vjvjvjvjghv • yesterday at 11:07 PM

I bet they would get much better results if they spent a fraction of the money to better understand their systems and designing them better than spending millions on Datadog

secondcoming • yesterday at 10:38 PM

We are moving from Datadog to Prometheus/Grafana and it's really not all a bed of roses. You'll need monitoring on your monitoring.

alt Hacker News

Replies