This is a lot of basically sharpshooting, but I will address your last point:
> There was no way to move from 32-bits to >32-bits without every network stack of every device element (host, gateway, firewall, application, etc) getting new code. Anything that changed the type and size of sockaddr->sa_family (plus things like new DNS resource record types: A is 32-bit only; see addrinfo->ai_family) would require new code.
That is simply not true. We had one bit left (the reserved/"evil" bit) in IPv4 headers that could have been used to flag that the first N bytes of the payload were an additional IPv4.1 header indicating additional routing information. Packets would continue to transit existing networks and "4.1" capable boxes at edges could read the additional information to make further routing decisions inside of a network. It would have effectively used IPv4 as the core transport network and each connected network (think ASN) having a handful of routed /32s.
Overlay networks are widely deployed and have very minor technical issues.
But that would have only addressed the numbering exhaustion issues. Engineers often get caught in the "well if I am changing this code anyway" trap.
But v6 did do what you're describing here?
They didn't use the reserved bit, because there's a field that's already meant for this purpose: the next protocol field. Set that to 0x29 and it indicates that the first bytes of the payload contain a v6 address. Every v4 address has a /48 of v6 space tunnelled to it using this mechanism, and any two v4 addresses can talk v6 between them (including to the entire networks behind those addresses) via it.
If doing basically exactly what you suggested isn't enough to stop you from complaining about v6's designers, how could they possibly have done any better?
> That is simply not true. We had one bit left (the reserved/"evil" bit) in IPv4 headers […]
Great, there's an extra bit in the IPv4 packet header.
I was talking about the data structures in operating systems: are there any extra bits in the sockaddr structure to signal things to applications? If not, an entirely new struct needs to be deployed.
And that doesn't even get into having to deploy new DNS code everywhere.
An explicit goal of IPv6 considered as important as the address expansion was the simplification of the packet header, by having fewer fields and which are correctly aligned, not like in the IPv4 header, in order to enable faster hardware routing.
The scheme described by you fails to achieve this goal.