logoalt Hacker News

What came first: the CNAME or the A record?

209 pointsby linolevantoday at 5:13 PM78 commentsview on HN

Comments

steve1977today at 6:46 PM

I don't find the wording in the RFC to be that ambiguous actually.

> The answer to the query, possibly preface by one or more CNAME RRs that specify aliases encountered on the way to an answer.

The "possibly preface" (sic!) to me is obviously to be understood as "if there are any CNAME RRs, the answer to the query is to be prefaced by those CNAME RRs" and not "you can preface the query with the CNAME RRs or you can place them wherever you want".

show 7 replies
bwblabstoday at 10:00 PM

I will hijack this post to point out CloudFlare really doesn't understand RFC1034, their DNS authoritative interface only blocks A and AAAA if there is a CNAME defined, e.g. see this:

  $ echo "A AAAA CAA CNAME DS HTTPS LOC MX NS TXT" | sed -r 's/ /\n/g' | sed -r 's/^/rfc1034.wlbd.nl /g' | xargs dig +norec +noall +question +answer +authority @coco.ns.cloudflare.com
  ;rfc1034.wlbd.nl.  IN A
  rfc1034.wlbd.nl. 300 IN CNAME www.example.org.
  ;rfc1034.wlbd.nl.  IN AAAA
  rfc1034.wlbd.nl. 300 IN CNAME www.example.org.
  ;rfc1034.wlbd.nl.  IN CAA
  rfc1034.wlbd.nl. 300 IN CAA 0 issue "really"
  ;rfc1034.wlbd.nl.  IN CNAME
  rfc1034.wlbd.nl. 300 IN CNAME www.example.org.
  ;rfc1034.wlbd.nl.  IN DS
  rfc1034.wlbd.nl. 300 IN DS 0 13 2 21A21D53B97D44AD49676B9476F312BA3CEDB11DDC3EC8D9C7AC6BAC A84271AE
  ;rfc1034.wlbd.nl.  IN HTTPS
  rfc1034.wlbd.nl. 300 IN HTTPS 1 . alpn="h3"
  ;rfc1034.wlbd.nl.  IN LOC
  rfc1034.wlbd.nl. 300 IN LOC 0 0 0.000 N 0 0 0.000 E 0.00m 0.00m 0.00m 0.00m
  ;rfc1034.wlbd.nl.  IN MX
  rfc1034.wlbd.nl. 300 IN MX 0 .
  ;rfc1034.wlbd.nl.  IN NS
  rfc1034.wlbd.nl. 300 IN NS rfc1034.wlbd.nl.
  ;rfc1034.wlbd.nl.  IN TXT
  rfc1034.wlbd.nl. 300 IN TXT "Check my cool label serving TXT and a CNAME, in violation with RFC1034"
The result is DNS resolvers (including CloudFlare Public DNS) will have a cache dependent result if you query e.g. a TXT record (depending if it has the CNAME cached). At internet.nl (https://github.com/internetstandards/) we found out because some people claimed to have some TXT DMARC record, while also CNAMEing this record (which results in cache dependent results, and since internet.nl uses RFC 9156 QName Minimisation, if first resolves A, and therefor caches the CNAME and will never see the TXT). People configure things similar to https://mxtoolbox.com/dmarc/dmarc-setup-cname instructions (which I find in conflict with RFC1034).
show 1 reply
patrickmaytoday at 6:09 PM

A great example of Hyrum's Law:

"With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody."

combined with failure to follow Postel's Law:

"Be conservative in what you send, be liberal in what you accept."

show 4 replies
NelsonMinartoday at 6:52 PM

It's remarkable that the ordinary DNS lookup function in glibc doesn't work if the records aren't in the right order. It's amazing to me we went 20+ years without that causing more problems. My guess is most people publishing DNS records just sort of knew that the order mattered in practice, maybe figuring it out in early testing.

show 2 replies
seiferterictoday at 7:29 PM

Now that I have seemingly taken on managing DNS at my current company I have seen several inadequacies of DNS that I was not aware of before. Main one being that if an upstream DNS server returns SERVFAIL, there is no distinction really between if the server you are querying is failed, or the actual authoritative server upstream is broken (I am aware of EDEs but doesn't really solve this). So clients querying a broken domain will retry each of their configured DNS servers, and our caching layer (Unbound) will also retry each of their upstreams etc... Results in a bunch of pointless upstream queries like an amplification attack. Also have issue with the search path doing stupid queries with NXDOMAIN like badname.company.com, badname.company.othername.com... etc..

show 1 reply
forintitoday at 6:47 PM

> While in our interpretation the RFCs do not require CNAMEs to appear in any particular order, it’s clear that at least some widely-deployed DNS clients rely on it. As some systems using these clients might be updated infrequently, or never updated at all, we believe it’s best to require CNAME records to appear in-order before any other records.

That's the only reasonable conclusion, really.

show 1 reply
m3047today at 9:03 PM

DNS is a wire protocol, payload specification, and application protocol. For all of that, I personally wonder whether its enduring success isn't that it's remarkably underspecified when you get to the corner cases.

There's also so much of it, and it mostly works, most of the time. This creates a hysteresis loop in human judgement of efficacy: even a blind chicken gets corn if it's standing in it. Cisco bought cisco., but (a decade ago, when I had access to the firehose) on any given day belkin. would be in the top 10 TLDs if you looked at the NXDOMAIN traffic. Clients don't opportunistically try TCP (which they shouldn't, according to the specification...), but we have DoT (...but should in practice). My ISPs reverse DNS implementation is so bad that qname minimization breaks... but "nobody should be using qname minimization for reverse DNS", and "Spamhaus is breaking the law by casting shades at qname minimization".

"4096 ought to be enough for anybody" (no, frags are bad. see TCP above). There is only ever one request in a TCP connection... hey, what are these two bytes which are in front of the payload in my TCP connection? People who want to believe that their proprietary headers will be preserved if they forward an application protocol through an arbitrary number of intermediate proxy / forwarders (because that's way easier than running real DNS at the segment edge and logging client information at the application level).

Tangential, but: "But there's more to it, because people doing these things typically describe how it works for them (not how it doesn't work) and onlookers who don't pay close attention conclude "it works"." http://consulting.m3047.net/dubai-letters/dnstap-vs-pcap.htm...

sebastianmestretoday at 6:48 PM

I kind of wish they start sending records in randomized order to take out all the broken implementations that depend on such a fragile property

show 2 replies
wolttamtoday at 9:27 PM

My take is quite cynical on this.. This post reads to me like a post-justification of some strange newly introduced behaviour.

Please order the answer in the order the resolutions were performed to arrive at the final answer (regardless of cache timings). Anything else makes little sense, especially not in the name of some micro-optimization (which could likely be approached in other ways that don’t alter behaviour).

show 1 reply
mdavid626today at 7:56 PM

I would expect, that dns servers like 1.1.1.1 at this scale have integration tests running real resolvers, like the one in glibc. How come this issue was discovered only in production?

show 1 reply
tuetuopaytoday at 7:22 PM

Many rightfully interpret the RFC as "CNAME have to be before A", but the issue persists inbetween CNAMEs in the chain as noted in the article. If a record in the middle of the chain expires, glibc would still fail if the "middle" record was to be inserted between CNAMEs and A records.

It’s always DNS.

danepowelltoday at 8:05 PM

Doesn't the precipitating change optimize memory on the DNS server at the expense of additional memory usage across millions of clients that now need to parse an unordered response?

show 2 replies
runningmiketoday at 8:11 PM

The end of this blog is …. “ To learn more about our mission to help build a better Internet,”

Reminds me of https://news.ycombinator.com/item?id=37962674 or see https://tech.tiq.cc/2016/01/why-you-shouldnt-use-cloudflare/

ShroudedNighttoday at 6:27 PM

I'm not an IETF process expert. Would this be worth filing errata against the original RFC in addition to their new proposed update?

Also, what's the right mental framework behind deciding when to release a patch RFC vs obsoleting the old standard for a comprehensive update?

show 2 replies
kaysontoday at 6:16 PM

> However, we did not have any tests asserting the behavior remains consistent due to the ambiguous language in the RFC.

Maybe I'm being overly-cynical but I have a hard time believing that they deliberately omitted a test specifically because they reviewed the RFC and found the ambiguous language. I would've expected to see some dialog with IETF beforehand if that were the case. Or some review of the behavior of common DNS clients.

It seems like an oversight, and that's totally fine.

show 2 replies
1vuio0pswjnm7today at 9:11 PM

"One such implementation that broke is the getaddrinfo function in glibc, which is commonly used on Linux for DNS resolution. When looking at its getanswer_r implementation, we can indeed see it expects to find the CNAME records before any answers:"

Wherever possible I compile with gethostbyname instead of getaddrinfo. I use musl instead of glibc

Nothing against IPv6 but I do not use it on the computers and networks I control

show 1 reply
paulddrapertoday at 6:49 PM

> RFC 1034, published in 1987, defines much of the behavior of the DNS protocol, and should give us an answer on whether the order of CNAME records matters. Section 4.3.1 contains the following text:

> If recursive service is requested and available, the recursive response to a query will be one of the following:

> - The answer to the query, possibly preface by one or more CNAME RRs that specify aliases encountered on the way to an answer.

> While "possibly preface" can be interpreted as a requirement for CNAME records to appear before everything else, it does not use normative key words, such as MUST and SHOULD that modern RFCs use to express requirements. This isn’t a flaw in RFC 1034, but simply a result of its age. RFC 2119, which standardized these key words, was published in 1997, 10 years after RFC 1034.

It's pretty clear that CNAME is at the beginning.

The "possibly" does not refer to the order but rather to the presence.

If they are present, they are are first.

show 1 reply
urbandw311ertoday at 9:56 PM

I feel like they fucked it up then, when writing the post-mortem, went hunting for facts to retrospectively justify their previous decisions.

frumplestlatztoday at 6:15 PM

Given my years of experience with Cisco "quality", I'm not surprised by this:

> Another notable affected implementation was the DNSC process in three models of Cisco ethernet switches. In the case where switches had been configured to use 1.1.1.1 these switches experienced spontaneous reboot loops when they received a response containing the reordered CNAMEs.

... but I am surprised by this:

> One such implementation that broke is the getaddrinfo function in glibc, which is commonly used on Linux for DNS resolution.

Not that glibc did anything wrong -- I'm just surprised that anyone is implementing an internet-scale caching resolver without a comprehensive test suite that includes one of the most common client implementations on the planet.

thereintoday at 6:42 PM

After the release got reverted, it took an 1hr28min for the deployment to propagate. You'd think that would be a very long time for CloudFlare infrastructure.

show 3 replies
renewiltordtoday at 6:21 PM

Nice analysis. Boy I can’t imagine having to work at Cloudflare on this stuff. A month to get your “small in code” change out only to find some bums somewhere have written code that will make it not work.

show 3 replies
charcircuittoday at 6:06 PM

Random DNS servers and clients being broken in weird ways is such a common problem and will probably never go away unless DNS is abandoned altogether.

It's surprising how something so simple can be so broken.