No mention of branches, which is a complementary concept. If you unwind your loop, you can get part of the way to SIMD performance by keeping the CPU pipeline filled.