Hacker News new | past | comments | ask | show | jobs | submit login

If you have a hash table of size S with N entries in it, then the probability of hitting a collision on the next insertion is N/S. This is why many hash tables will resize and rehash once the load factor (which is N/S) reaches some predetermined fraction. Book-keeping using this method is cheap.

It's important to distinguish "What is the probability of the next insertion causing a collision?" and looking at a chained hash table of occupancy N and asking "What is the probability that this hash table already contains a collision?" (the birthday paradox yields 50% for 23 birthdays in answer to this), and again "How many collisions do I expect to find in this hash table?", which is what the author is writing about, and is actually the cumulative probability of there having been a collison over the lifetime of all previous insertions.

The collision rate will depend on how long entries live in your hash table and the rate at which you're doing insertions.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: