The way you'd want to use bloom filters when you want zero chance of false posit...

mikepurvis · on March 17, 2011

Rather than the fallback method being the list of all good words, how feasible would it be to use a bloom filter and then have the fallback method be an indexof list of all known false positives?

Would there be some more enlightened way of computing such a list than just crunching through every combination of letters up to length n? That is to say, could one devise a bloom filter with hash functions such that given the configured filter, a list of all possible matching elements could be produced?

ori_b · on March 18, 2011

As far as I can see, finding all false positives for a bloom filter is effectively impossible, since a false positive is simply a hash collision, effectively.

mikepurvis · on March 18, 2011

Okay, so... it's definitely possible, just really expensive. Depending on the lengths of words you cover, you have to crunch though a few trillion combinations of letters, looking for the false positives manually--- like computing a rainbow table, but without punctuation, mixed case, etc.

What I'm wondering, though, is if one could select a hashing function where the collisions were easily discoverable. For a "hash" function that's just m mod 8, the presence of a 1 bit means that you had to have had an input that was either 1, 9, 17, 25, 37, etc, up to the maximum word length. This may be a problem that's just as hard or harder than the rainbow table approach, but I'd be interested to explore its feasibility.