From my read of the article, this isn't a problem "in DNS." OP runs an uptime monitor...

quuxplusone • today at 1:21 AM • 1 reply • view on HN

From my read of the article, this isn't a problem "in DNS." OP runs an uptime monitoring service that purports to check whether DNS can resolve your domain — but today OP learned that because he's hitting his ISP's recursive resolver, he doesn't notice downtime until the TTL of the previous response expires.

Solution (which everyone else does, and OP has now implemented): don't use your ISP's recursive resolver! Run your own instance of bind9 or whatever, with the cache disabled. Or it seems like `dig +trace` would probably work, too.

"Cached resources remain visible for their whole TTL, even if the original becomes unreachable or changes" seems like one of the very first principles someone should learn when deciding to go into business selling an uptime monitoring service.

It's not a "ghost domain," it's a Time-To-Live field!

Replies

whatthesmack • today at 2:05 AM

I get where you're coming from, but I think the part you're missing is that caching is not optional (per the "with the cache disabled" comment). If you're running a service like theirs, you need to do caching to some degree.

My understanding of their implementation is that it does caching on a per-worker basis with lower TTLs, so it balances caching with accuracy, but doesn't "fix" visibility to the problem either. However, it narrows the window such that their method of monitoring will very likely expose a pulled domain sooner.

➕ show 1 reply

alt Hacker News

Replies