The author published the bandwidth of its algo, it's one fifth of a typical memory bandwidth (it's not possible to go faster than memory obviously for this benchmark, since we're assuming the data is not in cache).