> Managing multiple network upstreams (e.g. for network failover or load balancing) is a common e...

simoncion • today at 4:32 AM • 1 reply • view on HN

> Managing multiple network upstreams (e.g. for network failover or load balancing) is a common example ... that IPv6 cannot offer without using NPTv6 or NAT66.

I don't think that's true. I haven't had reason to do edge router failover, but I am familiar with the concepts and also with anycast/multihoming... so do make sure to cross-check what I'm saying here with known-good information.

My claim is that the scenario you describe is superior in the non-NATted IPv6 world to that of the NATted IPv4 world. Let's consider the scenario you describe in the IPv4-only world. Assume you're providing a typical "one global IP shared with a number of LAN hosts via IPv4 NAT". When one uplink dies, the following happens:

* You fail over to your backup link

* This changes your external IP address

* Because you're doing NAT, and DHCP generally has no way to talk back to hosts after the initial negotiation you have no way to alert hosts of the change in external IP address

* Depending on your NAT box configuration, existing client connections either die a slow and painful death, or -ideally- they get abruptly RESET and the hosts reestablish them

Now consider the situation with IPv6. When one uplink dies:

* You fail over to your backup link

* This changes your external prefix

* Your router announces the prefix change by announcing the new prefix and also that the now-dead one's valid lifetime is 0 seconds [0]

* Hosts react to the change by reconfiguring via SLAAC and/or DHCPv6, depending on the settings in the RA

* Existing client connections are still dead, [1] but the host gets to know that their global IP address has changed and has a chance to take action, rather than being entirely unaware

Assuming that I haven't screwed up any of the details, I think that's what happens. Of course, if you have provider-independent addresses [2] assigned to your site, then maybe none of that matters and you "just" fail over without much trouble?

[0] I think this is known as "deprecating" the prefix

[1] I think whether they die slow or fast depends on how the router is configured

[2] ...whether IPv4 or IPv6...

Replies

jcgl • today at 7:49 AM

> * Hosts react to the change by reconfiguring via SLAAC and/or DHCPv6, depending on the settings in the RA

This is the linchpin of the workflow you've outlined. Anecdotal experience in this area suggests it's not broadly effective enough in practice, not least because of this:

> * Existing client connections are still dead, [1] but the host gets to know that their global IP address has changed and has a chance to take action, rather than being entirely unaware

The old IP addresses (afaiu/ime) will not be removed before any dependent connections are removed. In other words, the application (not the host/OS) is driving just as much as the OS is. Imo, this is one of the core problems with the scenario, that the OS APIs for this stuff just aren't descriptive enough to describe the network reconfiguration event. Because of that, things will ~always be leaky.

> [1] I think whether they die slow or fast depends on how the router is configured

Yeah, and that configuration will presumably be sensitive to what caused the failover. This could manifest differently based on whether upstream A simply has some bad packet loss or whether it went down altogether (e.g. a physical fault).

In any case, this vision of the world misses on at least two things, in my view:

1. Administrative load balancing (e.g. lightly utilizing upstream B even when upstream A is still up

2. The long tail of devices that don't respond well to the flow you outlined. It's not enough to think of well-behaved servers that one has total control over; need to think also of random devices with network stacks of...varying quality (e.g. IOT devices)

➕ show 1 reply

alt Hacker News

Replies