>Fortunately, we still have /etc/hosts, which we can easily provision
This is the kind of thing you read in a post-mortem and wonder how they designed something so fiendishly wonderful.
At 2:00am our MySQL master failed and failed over successfully to our secondary server. As part of post-failover ops, ansible playbook proceeded to login to 1000 instances to update the hosts file for the new master. This caused traffic amplication which caused our Etcd nodes to believe they were down. As the etcd nodes failed over, our ansible playbook proceeded to then login to 1000 instances to update the hosts file...
Honestly, whatever system you built is justing do the same exact function as DNS just with extra steps. If you squint really hard /etc/hosts is your local dns cache and ansible is your resolver. I think this kind of "simplification fetishization" is dangerously attractive to people who have only managed relatively simply setups. I don't think anyone who has ever had to deal with high-availability failover would consider Ansible a good solution.
The problem that so many people hit with DNS isn't specific to DNS the protocol - it's the problem of service discovery. This architecture doesn't eliminate service discovery, it just moves it to a far more brittle configuration.
One does not even need to squint. The first page of RFC 882 explains outright that the DNS came about in the first place because the mechanisms for updating a HOSTS.TXT file and publishing it to loads of places did not scale.
That's still just as true for the intranets of the 2020s with thousands of machines all downloading a HOSTS file several times a day (or even hour/minute) as it was for the Internet of July 1983 with around 500 hosts that was merely downloaded by everyone a couple of times per week. The fact that a file can be copied faster now is counterbalanced by the fact that tying this to real-time failover means that it needs to be updated and distributed several orders of magnitude more quickly than it was in 1983 too. And that's ignoring the linear nature of a HOSTS file lookup contrasted with even the stupidest DNS implementation.
Those who think that HOSTS is a fallback for any sort of dynamic operation (into and out of service) of even hundreds of machines have not learned the history of why the DNS even exists.
I had an ISP customer years ago that had an AAA system designed by people who didnt understand DNS, DHCP or RADIUS. They also had no idea about netflow or SNMP.
The application would log into every router in the network and run a massive, on the fly script to manually create a bunch of PPPOE services, shaping targets for those connections, update firewall rules etc.
It would also run manual mikrotik bandwidth tests across every logical link it was aware of.
The application developers were adamant that this was the best way of doing things, and any disagreement would have them point at their dozen or so customers and boast that they surely wouldnt have been able to hoodwink that many people if they were doing it wrong.
Anyway we took a packet capture of all the every 10 minute script updates and demonstrated those to the customer as a whole number % of their bandwidth to certain smaller sites, and also were able to show them how they stopped getting "My internet goes out every 10 minutes" complaints as we turned off the automatic mikrotik bandwidth tests running every 10 minutes.
But to save their customer the application developers agreed to implement SNMP and RADIUS but they never did. IIRC their fee was a flat 15% of all profits generated by the customer, which was just staggering. And the fee could rise if they asked for support.