logoalt Hacker News

h33t-l4x0rtoday at 5:16 AM4 repliesview on HN

Hmm, Bloom filters seem important. I'm wondering why my CS education never even touched on them and it's tbh triggering my imposter syndrome.


Replies

on_the_traintoday at 10:14 AM

Unpopular opinion: They're one of these things that are popular because of their cool name. For most purposes, they're not a good fit

benmannstoday at 5:51 AM

They were only touched on (and just barely) in my CS education, so don’t feel too left out. Spend an evening or two on the Wiki for Probabilistic data structures[0]. With a CS education you should have the baseline knowledge to find them really fascinating. Enjoy!

Oh, and I don’t find myself actually implementing any of these very often or knowing that they are in use. I occasionally use things like APPROX_COUNT_DISTINCT in Snowflake[1], which is a HyperLogLog (linked in the Wiki).

[0]: https://en.wikipedia.org/wiki/Category:Probabilistic_data_st...

[1]: https://docs.snowflake.com/en/sql-reference/functions/approx...

vyrtoday at 7:26 AM

they're common in databases and performance instrumentation of various kinds (as are other forms of data structure "sketch" like count sketches) but not as common outside those realms.

i've gotten interview questions best solved with them a few times; a Microsoft version involved spell-checking in extremely limited memory, and the interviewer told me that they'd actually been used for that back in the PDP era.

dheeratoday at 7:03 AM

My education didn't touch upon it but I've been grilled on it multiple times in interviews.

I learned about them after the first time I got grilled and rejected. Sucks to be the first company that grilled me about it, thanks for the tip though, you just didn't stick around long enough to see how fast I learn