Here's my attempt at summarizing the paper with the hope that someone who understands it better can correct me.
Cuckoo hashes are a type of closed hash table where any given item has a fixed number of places where it can be located. Unlike other closed hashes, if there is not room for an item in one of its possible locations, an earlier item is evicted (in the style of a cuckoo kicking an egg out of strangers nest and quickly laying its own) and sent to its alternate spot. This repeats until an empty slot is found, or some large number (500) attempts have been made. If no slot is found, the table can be rehashed to try again, or enlarged to make more space available.
Initially cuckoo hashes were proposed to have 2 slots for each item, corresponding to 2 different hashes. Over time, for greater space efficiency (load) it's evolved to have a variable number of hashes >= 2 and a variable number of slots per bucket >= 1. This paper suggests that using 2 hashes and 4 slots per bucket is a good default (a 2,4 cuckoo hash). One of the innovations in this paper is that instead of using 2 independent hashes (the theoretical best case) they use the output of a single hash function and a shorter saved "signature" that can be hashed and xor'ed to find the 'alternate' slot. In this manner, the signature can be used to move an item to its alternate location without actually knowing or saving the original value.
This 'cuckoo filter' can be queried (LOOKUP) to see if an item is already in the hash (INSERT). If an item is present, a 'yes' answer is guaranteed. But if an item is not present, there is a low percentage chance that a false positive 'yes' answer will be returned. The number of bits required to store each signature is comparable to the number of bits required for a Bloom filter with a comparable rate of false positives. Through various optimizations (such as storing the items in a bucket in a known order) the number of bits per item stored drops to slightly below that of a Bloom filter.
Unlike a Bloom filter, deletions are made possible by storing multiple copies of the same signature. But since there only a set number of places that a given value can be stored, only a limited number of copies (minus the number of collisions) can be stored. In this case, that number is 2 * 4 = 8. If more inserts than this are attempted, either the insert must fail or the delete count will be out of sync.
Performance wise, at the same level of space efficiency, insert speeds are better than a standard Bloom filter when the hash is mostly empty (low load), but significantly worse than a Bloom filter at high load when the hash gets filled and many 'evictions' are necessary for each insert. But if a tradeoff for space is acceptable (to stay at ~50% load), the Cuckoo Hash inserts much faster than its Bloom counterpart. Lookup speeds for existing keys are 2-3x the speed of a standard Bloom filter, and about 1.5x than the "Blocked" Bloom filter variant. Negative lookups (key not present) are about 2x faster than Bloom at high load, but about 3/4 the speed at low load.
Overall, it's a nice technique that should probably be used more often.
Cuckoo hashes are a type of closed hash table where any given item has a fixed number of places where it can be located. Unlike other closed hashes, if there is not room for an item in one of its possible locations, an earlier item is evicted (in the style of a cuckoo kicking an egg out of strangers nest and quickly laying its own) and sent to its alternate spot. This repeats until an empty slot is found, or some large number (500) attempts have been made. If no slot is found, the table can be rehashed to try again, or enlarged to make more space available.
Initially cuckoo hashes were proposed to have 2 slots for each item, corresponding to 2 different hashes. Over time, for greater space efficiency (load) it's evolved to have a variable number of hashes >= 2 and a variable number of slots per bucket >= 1. This paper suggests that using 2 hashes and 4 slots per bucket is a good default (a 2,4 cuckoo hash). One of the innovations in this paper is that instead of using 2 independent hashes (the theoretical best case) they use the output of a single hash function and a shorter saved "signature" that can be hashed and xor'ed to find the 'alternate' slot. In this manner, the signature can be used to move an item to its alternate location without actually knowing or saving the original value.
This 'cuckoo filter' can be queried (LOOKUP) to see if an item is already in the hash (INSERT). If an item is present, a 'yes' answer is guaranteed. But if an item is not present, there is a low percentage chance that a false positive 'yes' answer will be returned. The number of bits required to store each signature is comparable to the number of bits required for a Bloom filter with a comparable rate of false positives. Through various optimizations (such as storing the items in a bucket in a known order) the number of bits per item stored drops to slightly below that of a Bloom filter.
Unlike a Bloom filter, deletions are made possible by storing multiple copies of the same signature. But since there only a set number of places that a given value can be stored, only a limited number of copies (minus the number of collisions) can be stored. In this case, that number is 2 * 4 = 8. If more inserts than this are attempted, either the insert must fail or the delete count will be out of sync.
Performance wise, at the same level of space efficiency, insert speeds are better than a standard Bloom filter when the hash is mostly empty (low load), but significantly worse than a Bloom filter at high load when the hash gets filled and many 'evictions' are necessary for each insert. But if a tradeoff for space is acceptable (to stay at ~50% load), the Cuckoo Hash inserts much faster than its Bloom counterpart. Lookup speeds for existing keys are 2-3x the speed of a standard Bloom filter, and about 1.5x than the "Blocked" Bloom filter variant. Negative lookups (key not present) are about 2x faster than Bloom at high load, but about 3/4 the speed at low load.
Overall, it's a nice technique that should probably be used more often.