Hacker News new | past | comments | ask | show | jobs | submit login

One basic difference between Count-Min Sketches and HyperLogLog is that Count-Min Sketches represent the cardinality of items in multisets while HyperLogLog represent the cardinality of sets.

For example, Put 3 of the item 'A' (where each 'A' is indistinguishable from another 'A') and similarly 5 of item 'B' into a Count-Min Sketch and a HyperLogLog.

When you query the Count-Min Sketch for 'A', you can probably get back 3. You can also query for 'B' and get back 5.

With you query the HyperLogLog you query how many unique/indistinguishable items are in the set, in this case 2. The HyperLogLog would return a number near 2. You could query the HyperLogLog for whether or not 'A' is in the set, but HyperLogLog are not designed to be accurate for that type of query. It would return true in this case and in general there are no false negatives for set membership queries in the HyperLogLog. I believe it would be less accurate than a Bloom Filter for set membership queries.




If you made the number of registers (m) extremely large and the register width 1 then a bloom filter and HLL are almost exactly the same (except bloom filters have multiple hashes for the same input and in HLL you only split one hash into two parts).

You can also use a bloom filter as a cardinality counter assuming it isn't too filled up yet: http://en.wikipedia.org/wiki/Bloom_filter#Approximating_the_...

HLLs are designed to be useful for cardinality counting past the point where a bloom filter would be over capacity. HLL would be useful for set membership but after some threshold the false positive rate will approach 100% at some super-linear rate.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: