Isn't "quaternary" just sort of unrolling the binary search loop by one level? I mean...

drob518 • yesterday at 3:13 PM • 5 replies • view on HN

Isn't "quaternary" just sort of unrolling the binary search loop by one level? I mean, to find the partition in which the item is located, you still do roughly the same rough number of comparisons. You're just taking them 4 at a time, not 2 at a time. Seems like loop unrolling would give you the same.

Replies

nkurz • yesterday at 3:59 PM

It's trickier than that. Modern processors are speculative, which means that they guess at the result for a comparison and keep going along one side of a branch as far as they can until they are told they guessed wrong or hit some internal limit. If they guessed wrong, they throw away the speculative work, take a penalty of a handful of cycles, and do the same thing again from a different starting point.

Essentially, this means that all loops are already unrolled from the processors point of view, minus a tiny bit of overhead for the loop itself that can often be ignored. Since in binary search the main cost is grabbing data from memory (or from cache in the "warm cache" examples) this means that the real game is how to get the processor to issue the requests for the data you will eventually need as far in advance as possible so you don't have to wait as long for it to arrive.

The difference in algorithm for quad search (or anything higher than binary) is that instead of taking one side of each branch (and thus prefetching deeply in one direction) is that you prefetch all the possible cases but with less depth. This way you are guaranteed to have successfully issued the prefetch you will eventually need, and are spending slightly less of your bandwidth budget on data that will never be used in the actual execution path.

As others are pointing out, "number of comparisons" is almost useless metric when comparing search algorithms if your goal is predicting real world performance. The limiting factor is almost never the number of comparisons you can do. Instead, the potential for speedup depends on making maximal use of memory and cache bandwidth. So yes, you can view this as loop unrolling, but only if you consider how branching on modern processors works under the hood.

➕ show 1 reply

wtallis • yesterday at 3:23 PM

Yes, this can be seen as unrolling the loop a bit. It improves performance not by significantly reducing the number of instructions or memory reads, but by relaxing the dependencies between operations so that it doesn't have to be executed purely serially. You could also look at it as akin to speculatively executing both sides of the branch.

mayoff • yesterday at 3:28 PM

Quaternary search effectively performs both of the next loop iteration’s possible comparisons simultaneously with the current iteration’s comparison. This is a little more complex than simple loop unrolling.

Regardless, both kinds of search are O(log N) with different constants. The constants don’t matter so much in algorithms class but in the real world they can matter a lot.

loeg • yesterday at 3:50 PM

Sort of, yes, but you're also removing a data dependency between the unrolled stages.

pfortuny • yesterday at 4:14 PM

It is because processors do not do what one might naively think they do.

alt Hacker News

Replies