logoalt Hacker News

necovekyesterday at 5:30 AM1 replyview on HN

The examples you cite (eg. 2021 Facebook outage) have nothing to do with DNS being used for internal infrastructure.

In the other example (Amazon DynamoDB issue), the problem is with dynamically choosing from a large dynamic pool of IP addresses for a service — DNS is but one mechanism to do it. If it wasn't DNS, it could have been something else that did that job that was broken. Even /etc/hosts if it was updated with an empty record.

What I am saying is that your analysis is not defining the problem you want solved exactly, your examples are not backing up your proposal or analysis, and you are ignoring all the things DNS does both for public and private infrastructure. You seem to have some intuition about this adding complexity and thus being a risk (which is true), but you need to do a better job of connecting and analysing real risks and proposed solutions (and their comparative performance).


Replies

louwrentiusyesterday at 6:09 AM

I do state in the article that in the examples DNS isn't the root-cause, but the blast radius is very significant. Regardless of the topic of external/internal services, isn't it remarkable that a group of very smart and well-paid people create such circular dependancies?

Yet, I'm not arguing for Facebook or similar size companies to ditch DNS internally. I'm making the argument for much smaller organisations to pause and think where their own risks lie and if it would make sense to cut out DNS to reduce risk. Whatever process you used as an organisation to update DNS in a safe manner, you still use with the alternative solution, that doesn't change.

That said, even an broken update to /etc/hosts is probably easier and faster to recover from than a broken DNS service that everything is tied to and due to TTL caching, can take much longer to resolve.

show 3 replies