> That is a 43% reduction, and it is free: no source change, just a compiler flag.
It's not entirely free; the cost is that the resulting binary will no longer run on processors that lack the instruction. Which, admittedly, is ≈2007 or older. But still! I have a 2012 CPU still in service, and as much as I'd love to obsolete it, gestures at the price tag of RAM these days.
… a 2012 CPU is surprisingly competitive relative to today's tech, too, I'd add. The gap between 2012 and 2026 is nothing compared to the equivalent gap between 1998 and 2012: 1998 is like 500MHz single-core, 32-bit. 2012 is 4 core, 8 hyper threads, 64-bit, 3.5 GHz. (… perhaps more remarkably, my next-oldest machine, a 2017 laptop, is only 2.8 GHz, with the same 4(/8) cores. It also uses like half the power, too. That's mostly the "laptop" bit, though.)
(That same CPU is also incapable of "v3".)
My main problem was that our hosting company offers cheap Linux servers, but with a shared CPU that even doesn't support v2. We pay more now, but you could still run into that problem.