It Probably Works: Probabilistic algorithms are all around us

amelius · on Dec 12, 2015

Everybody using md5sums (or equivalent) and assuming they don't clash is using probabilistic algorithms.

There is nothing wrong with this if the probability of failing is smaller than the chance of an asteroid landing on your head.

ant6n · on Dec 12, 2015

... And if the impact of failure is less than an asteroid landing on your head.

michaelcampbell · on Dec 12, 2015

Landing, however unlikely, is probably ok.

Impacting (more, but still unlikely), is probably not.

marvy · on Dec 12, 2015

nice one!

chucksmash · on Dec 12, 2015

I wonder how many people using UUIDs still write a conditional to verify uniqueness against the previously generated ones.

Edit: not implying there aren't perfectly good reasons to make the check that are unrelated to the probabilities of a collision in the ideal case. Just an idle musing.

comex · on Dec 12, 2015

And, of course, considering the weakness of MD5, assuming no untrusted party had an opportunity to manipulate the messages (unless a collision couldn't possibly benefit them).

Kapura · on Dec 12, 2015

Fantastic article. Probabilistic algorithm design is fascinating to me because it's all about making sure that the system behaves as expected, rather than every individual operation being perfect. Implementing most random algorithms in practise strongly reminds me of studies of populations & behaviours of organic life (which is much better than the rube goldberg/spaghetti factories which sometimes arise as "solutions" to complex problems).

Another interesting related field is Dave Ackley's Robustness-First Computing model[1], which operates with every atomic computational cell having some elements of randomness built in.

[1] https://www.youtube.com/watch?v=helScS3coAE

mdxn · on Dec 12, 2015

For those who are interested, Berkeley has publicly available lecture notes for their Randomness & Computation course. They go over some randomized algorithms as well as the mathematical tools necessary to analyze them. Just a heads up, the content is fairly theoretical.

http://www.cs.berkeley.edu/~sinclair/cs271/f11.html

calebm · on Dec 12, 2015

Good example of a probabilistic algorithm/data structure: https://en.wikipedia.org/wiki/Skip_list

dubroff · on Dec 12, 2015

Could someone explain or point me to a resource as to why using multiple buckets in the HyperLogLog algorithm makes it more accurate than a single bucket?

zeeboo · on Dec 13, 2015

I don't have an article but my intuition is that it reduces variance. It smears the probability of getting "unlucky" with the hashes across all of the buckets.