logoalt Hacker News

Consistently faster and smaller compressed bitmaps with Roaring (2016)

53 pointsby hambandit11/01/20247 commentsview on HN

Comments

o11c11/08/2024

Title is confusing: this is not about the original "Roaring", but an extension of it called "Roaring+Run".

Here, "bitmap" = "set of sometimes-compact integers". The "uncompressed" and several "rle" implementations are obvious. Hm, except this only seems to be talking about a particularly-naive RLE approach (suitable for storage but not computation)? If you're doing computation I expect you to use absolute offsets rather than relative ones which means you can just do binary search (the only downside of this is that you can't use variable-length integers now).

Roaring is just a fixed 2-level trie, where the outer node is always an array of pointers and where the inner nodes can be either uncompressed bitvectors (if dense) or an array of low-half integers (if sparse). Also, it only works for 32-bit integers at a fundamental level; significant changes are needed for 64-bit integers.

This paper adds a third representation for the inner node, the bsearch'able absolute RLE method I mentioned earlier (before even reading the paper beyond the abstract).

Overall there's neither anything novel nor anything particularly exciting about this paper, but if you ignore all the self-congratulations it might work as a decent intro to the subject? Except maybe not since there are a lot of things it fails to mention (the ping-pong problem, deduplicated tries, the approach of using a separate set for sparse values in the same range, the "overall sparse but locally semi-dense" representation that uses fixed-size single-word bitsets, ...)

show 1 reply
softwaredoug11/08/2024

Roaring bitmaps are really useful for doing phrase search in search engines.

Basically you can find cases where one term is immediately before another by left shifting the right terms roaring encoded positions in all documents and bitwise-anding the similarly roaring-encoded payload of the preceding term. All with a highly compressed representation of term positions.

With something like numpy you can do this in a handful of logical operations in python.

https://softwaredoug.com/blog/2024/01/21/search-array-phrase...

pncnmnp11/08/2024

Hey everyone, if you're looking for a more approachable guide on bitmap compression, I wrote a blog post on it this year: https://pncnmnp.github.io/blogs/roaring-bitmaps.html. It covers run-length encoding, BBC, WAH, Concise, and even Roaring Bitmaps.

show 1 reply