logoalt Hacker News

pkoirdtoday at 4:14 AM2 repliesview on HN

Clever. My first impression was that surely this saturates the filter too fast as we're setting more bits at once but looks like the maths checks out. It's one of those non-intuitive things that I am glad I learned today.


Replies

FreakLegiontoday at 8:21 AM

It works because the original filter has suboptimal settings. An optimal filter of that size and number of items would set 5 bits per item and have about a quarter of the false positive rate. The 2 bits per item in the blocked filter is still suboptimal, but it's also saving them from saturating a bunch of 32-bit blocks, at the cost of a much higher overall false positive rate.

lemageduragetoday at 4:30 AM

True, I had the same feeling. The article does go off 256K elements in a bloom filter of 2M. After 1M elements, using 2 bits actually increases false positive rate, but at that point the false positive rate is higher than 50% already.