Hacker News new | past | comments | ask | show | jobs | submit login
A Seven Dimensional Analysis of Hashing Methods [pdf] (vldb.org)
113 points by tjalfi on April 25, 2017 | hide | past | favorite | 5 comments



Really interesting analysis, but constraining their keys to 64-bit integers and using 'hash functions' that only accept 64-bit integers as input skews the results away from real-world use cases.

In practice I've found that a variant of Robin Hood hashing with cache-line-sized bins and a hard limit on how deep to search for movable keys when inserting new keys (max of 3 possible bin locations per key, max search depth of 3 moves per insert) gives very good performance even with high load factors.

-Austin, author of MurmurHash


The actual title is "A Seven-Dimensional Analysis of Hashing Methods and its Implications on Query Processing".


Recently I was reading about hash tables on wikipedia and I was surprised to learn about how much the birthday problem influenced their design.

"A real world example of a hash table that uses a self-balancing binary search tree for buckets is the HashMap class in Java version 8."

https://en.wikipedia.org/wiki/Hash_table#Separate_chaining_w...


This is only to counter worst-case attack scenarios which do not happen in a realistic scenario. Like >100 collisions.

Robin Hood or its better cousin Hopscotch hashing, which was forgotten here, counter that scenario much better, with only ~1-3% performance loss on average in the best case, and much better perf. numbers in the average write-heavy case, where their "move to front" strategy pays off. And they have the worst case scenario also covered, unlike all others.


The birthday problem directly relates to Bloom and Cuckoo filters as well. The false positive rate is how often two hashes collide.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: