logoalt Hacker News

decimalenoughyesterday at 9:01 PM3 repliesview on HN

> Assume that Datadog cuts the number of outages by half, by preventing them with early monitoring. That would mean that without Datadog, we’d look at 24 hours’ worth of downtime, not 12. Let’s also assume that using Datadog results in mitigating outages 50% faster than without - thanks to being able to connect health metrics with logs, debug faster, pinpoint the root cause and mitigate faster. In that case, without Datadog, we could be looking at 36 hours worth of total downtime, versus the 12 hours with Datadog. To put it in numbers: the company would make around $9M in revenue it would otherwise lose, Now that $10M/year fee practically pays for itself!

Those are some pretty heroic assumptions. In particular, they assume the only options are Datadog or nothing, when there are far cheaper alternatives like the Prometheus/Grafana/Clickhouse stack mentioned in the article itself.


Replies

passivepinetreeyesterday at 9:37 PM

Another assumption that bothers me here is that the $9M in revenue would be completely lost during an outage. I imagine many customers would simply wait until the outage was resolved before performing their intended transactions, meaning far less than $9M would be lost.

show 1 reply
vjvjvjvjghvyesterday at 11:07 PM

I bet they would get much better results if they spent a fraction of the money to better understand their systems and designing them better than spending millions on Datadog

secondcomingyesterday at 10:38 PM

We are moving from Datadog to Prometheus/Grafana and it's really not all a bed of roses. You'll need monitoring on your monitoring.