If the DNS server is actually local like it's supposed to be, it should have just a few ms ping. Quadrupling that just once is no big deal. The user won't even notice, since every OS does lots of background DNS activity before the user even opens an app or browser.
Saying that 2xRTT is a deal-breaker is like saying TCP in general is a deal breaker.
State per client is pretty simple. Use a bloom filter to decide if a client IP is ok for UDP, and slowly set bits to zero at random to force gradual eviction. With a secret nonce per server, the attacker can't engineer collisions except by controlling lots of IPs. For IPv6, just treat blocks above a certain size (e.g. a /48) as equivalent.
And again, this should be the default. Someone that is seriously trying to run an open resolver should have their own fork of the source code and adjust this as they need. The small-time operators that accidentally make their resolvers open won't notice a bloom filter or a slow initial lookup.
If the DNS server is actually local like it's supposed to be, it should have just a few ms ping. Quadrupling that just once is no big deal. The user won't even notice, since every OS does lots of background DNS activity before the user even opens an app or browser.
Saying that 2xRTT is a deal-breaker is like saying TCP in general is a deal breaker.
State per client is pretty simple. Use a bloom filter to decide if a client IP is ok for UDP, and slowly set bits to zero at random to force gradual eviction. With a secret nonce per server, the attacker can't engineer collisions except by controlling lots of IPs. For IPv6, just treat blocks above a certain size (e.g. a /48) as equivalent.
And again, this should be the default. Someone that is seriously trying to run an open resolver should have their own fork of the source code and adjust this as they need. The small-time operators that accidentally make their resolvers open won't notice a bloom filter or a slow initial lookup.