I would expect, that dns servers like 1.1.1.1 at this scale have integration tests running real resolvers, like the one in glibc. How come this issue was discovered only in production?
This case would only happen if a CNAME chain first expired from the cache in the wrong order and then subsequently was queried via glibc. Theirs tests may test both that glibc resolving works and that re-querying expired records works, but not the combination of the two.
This case would only happen if a CNAME chain first expired from the cache in the wrong order and then subsequently was queried via glibc. Theirs tests may test both that glibc resolving works and that re-querying expired records works, but not the combination of the two.