Nice analysis. Boy I can’t imagine having to work at Cloudflare on this stuff. A month to get your “small in code” change out only to find some bums somewhere have written code that will make it not work.
It was glibc's resolver that failed - not exactly obscure. It wasn't properly tested or rolled out, plain and simple.
Or — hot take — to find out that you made some silly misinterpretation of the RFC that you then felt the need to retrospectively justify.
Or when working on massive infrastructure like this, you write plenty of tests that would have saved you a month worth of work.
They write reordering, push it and glibc tester fires, fails and you quickly discover "Crap, tests are failing and dependency (glibc) doesn't work way I thought it would."