This was the entry level project we did in a hardware optimization course I took maybe 15 years ago, using SIMD instructions. Lots of things can be naively optimized by unrolling any loops like this. Compilers do some of this themselves.