This code is not equivalent to the C++ version. You can directly use `*x == [0_u32; SIZE]`. The code generated by the two is different. (But the iterator version not producing optimal code is also an issue.)
With the correction, it interestingly enough produces the good behavior also at size=2. It also delays SIMD until size=5. But then it bizarrely stops doing SIMD again after size=64.
Very good point! Thanks!
With the correction, it interestingly enough produces the good behavior also at size=2. It also delays SIMD until size=5. But then it bizarrely stops doing SIMD again after size=64.
https://godbolt.org/z/P979nY4nf
The iterator version stays SIMD-y also after size=64, but stops at some point. What?! I don't know enough to understand what's going on. Anyone?