The "no sharing between filters" insight clicked for me on a different problem. I needed...

adamzwasserman • 11/04/2025 • 1 reply • view on HN

The "no sharing between filters" insight clicked for me on a different problem.

I needed to filter items by tags. Bloom filter per item seemed clever - quick membership checks. But with thousands of items sharing dozens of tags, each filter re-encodes the same vocabulary. Pure waste.

Switched to an inverted index (tag → item list) with bloom filters per chunk of the index. Now the tag vocabulary is shared, and bloom filters just speed up chunk-skipping when the index grows large.

TFA's mistake is using bloom filters -instead- of an inverted index rather than on top of one. The amortization patterns stack, they don't compete.

Replies

hinkley • 11/04/2025

Why do these “inverted indexes” just look like indexes to me? Too much time with databases perhaps?

➕ show 2 replies

alt Hacker News

Replies