Bit late but ill try - perfect hashing is a variant where the set of keys is kno... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

thealig on Dec 19, 2022 | parent | context | favorite | on: I almost failed to search a 37 GB text file in und...

Bit late but ill try - perfect hashing is a variant where the set of keys is known( say N) and the hash table size is set large enough (N^2) such that the no of collision is 0 in a nutshell (expected no of collisions to be precise which is less than 1)

So if i understand correctly the variance of the data would not really matter since we will choose a hash table of size of the total space of passwords thus absolutely no collisions will occur no matter which distribution of keys we pick and hash

Thats where the O(1) comes from since searching for a key in a hash slot would take constant time as there is just one key in each slot (due to no collisions)

As the other commentor said the SHA1 collision has such a low probability that theres no need for perfect hashing

nielsole on Dec 19, 2022 | [–]

Ah, ok that makes a lot of sense. But in the use case of the article I assumed we were under the constraint that output space would be exactly the size of input space as we have a non-sparse file on disk.

vlovich123 on Dec 20, 2022 | [–]

Pretty sure you get O(n) space usage

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact