logoalt Hacker News

koverstreetyesterday at 6:51 PM0 repliesview on HN

For on disk data structures, yes.

LSM-trees do really badly at multithreaded update workloads, and compaction overhead is really problematic when there isn't much update locality.

On the other hand, having most of your index be constant lets you use better data structures. Binary search is really bad.

For pure in memory indexes, according to the numbers I've seen it's actually really hard to beat a pure (heavily optimized) b-tree; for in-memory you use a much smaller node size than on disk (I've seen 64 bytes, I'd try 256 if I was writing one).

For on disk, you need to use a bigger node size, and then binary search is a problem. And 4k-8k as is still commonly used is much too small; you can do a lockless or mostly lockless in-memory b-tree, but not if it's persistent, so locking overhead, cache lookups, all become painful for persistent b-trees at smaller node sizes, not to mention access time on cache miss.

So the reason bcachefs's (and bcache's) btree is so fast is that we use much bigger nodes, and we're actually a hybrid compacting data structure. So we get the benefits of LSM-trees (better data structures to avoid binary search for most of a lookup) without the downsides, and having the individual nodes be (small, simple) compacting data structures is what makes big btree nodes (avoiding locking overhead, access time on node traversal) practical.

B-epsilon btrees are dumb, that's just taking the downsides of both - updating interior nodes in fastpaths kills multithreaded performance.