Hacker News new | past | comments | ask | show | jobs | submit login

This is kind of a fundamental part of bloom filters. They're probabilistic data structures, which is the trade you make for space efficiency.

If you get a false back, you can guarantee that some item is not in the filter. If you get a true back, you can be fairly certain it is. Once you have a degree of certainty, you can do a more thorough check for membership while cutting out large numbers of negatives.




I get the concept - you're trading space for certainty. It's more the bit where it says:

    "The more hash functions you have, the slower your bloom filter, and the quicker it fills up. *If you have too few, however, you may suffer too many false positives.*"
Just made me wonder - couldn't badly chosen hash functions actually give you more false positives?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: