A simpler and more concise explanation in https://github.com/aaw/hyperloglog-red... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

dblock on Feb 4, 2015 | parent | context | favorite | on: How a HyperLogLog works

A simpler and more concise explanation in https://github.com/aaw/hyperloglog-redis.

"The basic idea of HyperLogLog (and its predecessors PCSA, LogLog, and others) is to apply a good hash function to each value observed in the stream and record the longest run of zeros seen as a prefix of any hashed value. If the hash function is good, the bits in any hashed value should be close to statistically independent, so seeing a value that starts with exactly X zeros should happen with probability close to 2 -(X + 1). So, if you've seen a run of 5 zeros in one of your hash values, you're likely to have around 2 6 = 64 values in the underlying set. The actual implementation and analysis are much more advanced than this, but that's the idea."

rudolf0 on Feb 4, 2015 [–]

Interesting to see Bitcoin use almost the same principle. Essentially it's "if there is a hash with enough leading zeroes, we can assume a lot of CPU time has been put into the network."

Of course that's just an artifact of what's really happening, which is pseudo-random values + laws of probability.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact