logoalt Hacker News

xnorswaplast Friday at 3:45 PM3 repliesview on HN

This is time efficient* but rather wasteful of space.

The best way to save space is to use a Bloom Filter.

If we capture all the even numbers, that would sadly only give us "Definitely not Even" or "Maybe Even".

But for just the cost of doubling our space, we can use two Bloom filters!

So we can construct one bloom filter capturing even numbers, and another bloom filter capturing odd numbers.

Now we have "Definitely not Even" and "Maybe Even" but also "Definitely not Odd" and "Maybe Odd".

In this manner, we can use the "evens" filter to find the odd numbers and the "odds" filter to find the even numbers.

Having done this, we'll be left with just a handful of unlucky numbers that are recorded as both "Maybe even" and "Maybe odd". These will surely be few enough in number that we can special case these in our if/else block.

The filters as a first-pass will save gigabytes of memory!


Replies

gopalvlast Friday at 4:22 PM

> But for just the cost of doubling our space, we can use two Bloom filters!

We can optimize the hash function to make it more space efficient.

Instead of using remainders to locate filter positions, we can use a mersenne prime number mask (like say 31), but in this case I have a feeling the best hash function to use would be to mask with (2^1)-1.

show 1 reply
alexjurkiewiczlast Saturday at 1:56 AM

You are absolutely write! Let me write a Go program to implement this idea. The bloom filters will take approximately 5gb (for a 1% error rate) and take a few minutes to populate on a modern Macbook Pro.

https://gist.github.com/alexjurkiewicz/1abf05f16fd98aabf380c...

nixpulvislast Friday at 4:10 PM

How is this time efficient at all? It takes upwards of 40 seconds to compute on large 32bit values.

It's a joke post with some interesting bits and details.

show 5 replies