logoalt Hacker News

amlutolast Tuesday at 5:45 PM1 replyview on HN

To be fair to quickselect, I can imagine a lazy data processing framework having a concept of a lazily sorted data column where the actual data has been materialized but it’s not in sorted order yet. Then someone does “LIMIT k” to it, and the framework can go to town with quickselect.

As noted a couple times in this thread, there are all kinds of tradeoffs here, and I can’t imagine quickselect being even close to competitive for k that is small enough to fit in cache. Quickselect will, in general, scan a large input approximately twice. For k = 3, the answer fits in general-purpose registers or even in a single SIMD register, and a single scan with brute force accumulation of the answer will beat quickselect handily and will also beat any sort of log-time heap.

(In general, more advanced and asymptotically better algorithms often lose to simpler brute force algorithms when the parameters in question are smallish.)


Replies

senderistalast Tuesday at 5:50 PM

Yeah, obviously I wouldn't bother with a heap for k=3. A heap has good compactness but poor locality, so I guess it wouldn't perform well out of (some level of) cache.

show 1 reply