> I'm really suspicious about whether this would really work out in most workloads.
What kinds of workloads are those? Workloads where the Bloom filter itself fits in cache?
> On modern architectures, it's the random memory reference that kills performance
Yeah, the whole point of the article is that moving from a traditional Bloom filter to a block Bloom filter is to improve from N random accesses per query/update to 1 random access per query/update.
> Ultimately it depends on cache pressure and how over-subscribed the cache is
The unspoken assumption in the article is that the Bloom filter is much bigger than cache.
If your Bloom filter fits in cache, or you're writing for something with superfast RAM like a GPU or a Xeon Phi or something, block Bloom filters are wasteful for you.
But I'm pretty sure the common case is writing normal applications running on reasonable general-purpose CPU's with Bloom filter sizes in the hundreds of MB or more. [1] [2]
[1] If the Bloom filter's smaller than hundreds of MB, your data's probably small enough that you decide Bloom filters aren't worth the hassle and just use a HashMap or language equivalent.
[2] If you're using a Bloom filter to do some sort of network stuff, like figuring out the set difference of two peers' objects in some p2p protocol, you probably don't care too much about block Bloom filters, because in this application memory access time is probably going to be dominated by network latency.
> to improve from N random accesses per query/update to 1 random access per query/update.
I think their concern is that N (nondependent) random accesses can happen in parallel not much more slowly than 1 random access (assuming good speculation and sufficient memory bandwidth).
The N accesses don't need to happen in parallel; they are faster whenever they take place because they need one block in cache rather than the equivalent of k>1 blocks together to be fast (executed from cache) and it is unconditionally much better, even without assumptions about hot and cold keys.
If you had many lookups to do, then an approach with a coroutine that issues a prefetch instruction then yields to other lookups while the data is fetched in memory might be effective.
What kinds of workloads are those? Workloads where the Bloom filter itself fits in cache?
> On modern architectures, it's the random memory reference that kills performance
Yeah, the whole point of the article is that moving from a traditional Bloom filter to a block Bloom filter is to improve from N random accesses per query/update to 1 random access per query/update.
> Ultimately it depends on cache pressure and how over-subscribed the cache is
The unspoken assumption in the article is that the Bloom filter is much bigger than cache.
If your Bloom filter fits in cache, or you're writing for something with superfast RAM like a GPU or a Xeon Phi or something, block Bloom filters are wasteful for you.
But I'm pretty sure the common case is writing normal applications running on reasonable general-purpose CPU's with Bloom filter sizes in the hundreds of MB or more. [1] [2]
[1] If the Bloom filter's smaller than hundreds of MB, your data's probably small enough that you decide Bloom filters aren't worth the hassle and just use a HashMap or language equivalent.
[2] If you're using a Bloom filter to do some sort of network stuff, like figuring out the set difference of two peers' objects in some p2p protocol, you probably don't care too much about block Bloom filters, because in this application memory access time is probably going to be dominated by network latency.