The algorithm description was a bit confusing for me.
The SIMD part is just in the last step, where it uses SIMD to search the last 16 elements.
The Quad part is that it checks 3 points to create 4 paths, but also it's searching for the right block, not just the right key.
The details are a bit interesting. The author chooses to use the last element in each block for the quad search. I'm curious how the algorithm would change if you used the first element in each block instead, or even an arbitrary element.