Hacker News new | past | comments | ask | show | jobs | submit login
It Probably Works: Probabilistic algorithms are all around us (acm.org)
92 points by nkurz on Dec 12, 2015 | hide | past | favorite | 11 comments



Everybody using md5sums (or equivalent) and assuming they don't clash is using probabilistic algorithms.

There is nothing wrong with this if the probability of failing is smaller than the chance of an asteroid landing on your head.


... And if the impact of failure is less than an asteroid landing on your head.


Landing, however unlikely, is probably ok.

Impacting (more, but still unlikely), is probably not.


nice one!


I wonder how many people using UUIDs still write a conditional to verify uniqueness against the previously generated ones.

Edit: not implying there aren't perfectly good reasons to make the check that are unrelated to the probabilities of a collision in the ideal case. Just an idle musing.


And, of course, considering the weakness of MD5, assuming no untrusted party had an opportunity to manipulate the messages (unless a collision couldn't possibly benefit them).


Fantastic article. Probabilistic algorithm design is fascinating to me because it's all about making sure that the system behaves as expected, rather than every individual operation being perfect. Implementing most random algorithms in practise strongly reminds me of studies of populations & behaviours of organic life (which is much better than the rube goldberg/spaghetti factories which sometimes arise as "solutions" to complex problems).

Another interesting related field is Dave Ackley's Robustness-First Computing model[1], which operates with every atomic computational cell having some elements of randomness built in.

[1] https://www.youtube.com/watch?v=helScS3coAE


For those who are interested, Berkeley has publicly available lecture notes for their Randomness & Computation course. They go over some randomized algorithms as well as the mathematical tools necessary to analyze them. Just a heads up, the content is fairly theoretical.

http://www.cs.berkeley.edu/~sinclair/cs271/f11.html


Good example of a probabilistic algorithm/data structure: https://en.wikipedia.org/wiki/Skip_list


Could someone explain or point me to a resource as to why using multiple buckets in the HyperLogLog algorithm makes it more accurate than a single bucket?


I don't have an article but my intuition is that it reduces variance. It smears the probability of getting "unlucky" with the hashes across all of the buckets.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: