For something this short that is pure math, why not just hand write asm for the most popular platforms? Prevents compiler from deoptimizing in the future.
Have a fallback with this algorithm for all other platforms.
This pretty much is assembly written as C++... there's not much the compiler can ruin.
Because that isn’t portable?
This pretty much is assembly written as C++... there's not much the compiler can ruin.