I'm wondering how the compiler optimised add_v3() and add_v4() though.
Was it through "idiom detection", i.e. by recognising those specific patterns, or did the compiler deduce the answers them through some more involved analysis?