Brilliant! I never anticipated Bloom filters would be so simple. What comes to my mind is that for "forgetful" Bloom filters, one could remove a hash from the table, also erasing all equivalent entries from the memory. Is this a useful trade-off in practice?
The problem here would be that you'd be essentially invalidating arbitrary hashes from the table with each operation where they share the same "row" (or address in a bitmap) with other hashes.
Suppose we have a table such that "foo", "bar", "baz" all have overlapping entry -- if we attempt to remove any of them, we'd remove the rest also.
This being said, you might be interested in `Counting filter' instead, which support delete operations.
(To the other person who replied: In order to delete from a counting bloom filter, or a cuckoo filter, or any other data structure that doesn't store the original item, you have to know the item that you're deleting was present. Otherwise, you have the problem you noted.)
Thanks. That's what I meant with deleting "equivalent entries", I was wondering if there are cases where that would still be a net gain. But it makes sense that there are more specialized versions for such things.
PS I don't understand why my comment was downvoted, is there something wrong with asking such questions here?
I don't understand why my comment was downvoted, is there something wrong with asking such questions here?
Not AFAIK, but sometimes people downvote questions that are based on misunderstandings or false premises. Give it some time and you'll probably be voted back up to where you started.
I don't see how those could work. Since bloom filters can have false positives, the counting filter could end up letting you remove an item that was never added, thereby corrupting the entries for other items in the process.
You're right. However, in cases where we know the maximum number of items we'd be storing, we can adjust the "bucket size" of each row. Quoting the above wiki link:
> Because the counting Bloom filter table cannot be expanded, the maximal number of keys to be stored simultaneously in the filter must be known in advance. Once the designed capacity of the table is exceeded, the false positive rate will grow rapidly as more keys are inserted.