logoalt Hacker News

INTPenis10/11/20241 replyview on HN

I did something similar years ago when I was working with observability. Prometheus alertmanager triggers a special alert constantly that calls a lambda (or any webhook), so when alertmanager dies or is unable to alert then the lambda will send an alert over a 3rd party push service to notify ops that alertmanager is down.

We called it a dead man's switch but it was really just a way to monitor alertmanager.


Replies

adamdecaf10/11/2024

Yea I’ve setup two alertmanagers that check each other before. It’s useful for multi site deployments.