The OP should try with -march=native so the compiler can use vector instructions.
Slightly off-topic but I like this way to test if memory is all zeroes: https://rusty.ozlabs.org/2015/10/20/ccanmems-memeqzero-itera... (see "epiphany #2" at the bottom of the page) I really wish there was a standard libc function for it.