Unrelated to the new materialization option, this caught my eye: "this query sorts all 150 mi...

simonw • 04/22/2025 • 5 replies • view on HN

Unrelated to the new materialization option, this caught my eye:

"this query sorts all 150 million values in the helpful_votes column (which isn’t part of the table’s sort key) and returns the top 3, in just 70 milliseconds cold (with the OS filesystem cache cleared beforehand) and a processing throughput of 2.15 billion rows/s"

I clearly need to update my mental model of what might be a slow query against modern hardware and software. Looks like that's so fast because in a columnar database it only has to load that 150 million value column. I guess sorting 150 million integers in 70ms shouldn't be surprising.

(Also "Peak memory usage: 3.59 MiB" for that? Nice.)

This is a really great article - very clearly explained, good diagrams, I learned a bunch from it.

Replies

amluto • 04/22/2025

> I guess sorting 150 million integers in 70ms shouldn't be surprising.

I find sorting 150M integers at all to be surprising. The query asks for finding the top 3 elements and returning those elements, sorted. This can be done trivially by keeping the best three found so far and scanning the list. This should operate at nearly the speed of memory and use effectively zero additional storage. I don’t know whether Clickhouse does this optimization, but I didn’t see it mentioned.

Generically, one can find the kth best of n elements in time O(n):

https://en.m.wikipedia.org/wiki/Selection_algorithm

And one can scan again to find the top k, plus some extra if the kth best wasn’t unique, but that issue is manageable and, I think, adds at most a factor of 2 overhead if one is careful (collect up to k elements that compare equal to the kth best and collect up to k that are better than it). Total complexity is O(n) if you don’t need the result sorted or O(n + k log k) if you do.

If you’re not allowed to mutate the input (which probably applies to Clickhouse-style massive streaming reads), you can collect the top k in a separate data structure, and straightforward implementations are O(n log k). I wouldn’t be surprised if using a fancy heap or taking advantage of the data being integers with smallish numbers of bits does better, but I haven’t tried to find a solution or disprove the existence of one.

➕ show 4 replies

ww520 • 04/23/2025

Let's do a back of the envelope calculation. 150M u32 integers are 600MB. Modern SSD can do 14,000MB/s sequential read [1]. So reading 600MB takes about 600MB / 14,000MB/s = 43ms.

Memory like DDR4 can do 25GB/s [2]. It can go over 600MB in 600MB / 25,000MB/s = 24ms.

L1/L2 can do 1TB/s [3]. There're 32 CPU's, so it's roughly 32TB/s of L1/L2 bandwidth. 600MB can be processed by 32TB/s in 0.018ms. With 3ms budget, they can process the 600MB data 166 times.

The rank selection algorithms like QuickSelect and Floyd-Rivest have O(N) complexity. It's entirely possible to process 600MB in 70ms.

[1] https://www.tomshardware.com/features/ssd-benchmarks-hierarc...

[2] https://www.transcend-info.com/Support/FAQ-292

[3] https://www.intel.com/content/www/us/en/developer/articles/t...

➕ show 1 reply

skeptrune • 04/23/2025

Strong and up to date intuition on "slow vs. fast" queries is an underrated software engineering skill. Reading blogs like this one is worth it just for that alone.

baq • 04/22/2025

Slow VMs on overprovisioned cloud hosts which cost as much per month as a dedicated box per year have broken a generation of engineers.

You could host so much from your macbook. The average HN startup could be hosted on a $200 minipc from a closet for the first couple of years if not more - and I'm talking expensive here for the extra RAM you want to not restart every hour when you have a memory leak.

➕ show 4 replies

alt Hacker News

Replies