The crazy part about this is that (auto) vectorization in Rust looks something like this: iter.chunks(32).map(vectorized)
Where the vectorized function checks if the chunk has length 32, if yes run the algorithm, else run the algorithm.
The compiler knows that the chunk has a fixed size at compile time in the first block, which means it can now attempt to vectorize the algorithm with a SIMD size of 32. The else block handles the scalar case, where the chunk is smaller than 32.
The crazy part about this is that (auto) vectorization in Rust looks something like this: iter.chunks(32).map(vectorized)
Where the vectorized function checks if the chunk has length 32, if yes run the algorithm, else run the algorithm.
The compiler knows that the chunk has a fixed size at compile time in the first block, which means it can now attempt to vectorize the algorithm with a SIMD size of 32. The else block handles the scalar case, where the chunk is smaller than 32.