I don't think the native C++, even when bundled with OMP, goes far enough.
In my experience, ISPC and Google's Highway project lead to better results in practice - this mostly due to their dynamic dispatching features.
Could you elaborate on the dynamic dispatching features a bit more? Is that for portability?
Could you elaborate on the dynamic dispatching features a bit more? Is that for portability?