logoalt Hacker News

rbransonyesterday at 4:51 PM5 repliesview on HN

Biggest thing to watch out with this approach is that you will inevitably have some failure or bug that will 10x, 100x, or 1000x the rate of dead messages and that will overload your DLQ database. You need a circuit breaker or rate limit on it.


Replies

rr808yesterday at 6:11 PM

I worked on an app that sent an internal email with stack trace whenever an unhandled exception occurred. Worked great until the day when there was an OOM in a tight loop on a box in Asia that sent a few hundred emails per second and saturated the company WAN backbone and mailboxes of the whole team. Good times.

withyesterday at 10:52 PM

This is the same risk with any DLQ.

The idea behind a DLQ is it will retry (with some backoff) eventually, and if it fails enough, it will stay there. You need monitoring to observe the messages that can't escape DLQ. Ideally, nothing should ever stay in DLQ, and if it does, it's something that should be fixed.

shayonjyesterday at 5:11 PM

This! Only thing worse than your main queue backing off is you dropping items from going into the DLQ because it can’t stay up.

pletnesyesterday at 5:13 PM

If you can’t deliver to the DLQ, then what? Then you’re missing messages either way. Who cares if it’s down this way or the other?

show 3 replies
j45yesterday at 9:00 PM

It will happen eventually in any system.

No need to look down on PG because it makes it more approachable and is more longer a specialized skill.