Bit late but ill try - perfect hashing is a variant where the set of keys is known( say N) and the hash table size is set large enough (N^2) such that the no of collision is 0 in a nutshell (expected no of collisions to be precise which is less than 1)
So if i understand correctly the variance of the data would not really matter since we will choose a hash table of size of the total space of passwords thus absolutely no collisions will occur no matter which distribution of keys we pick and hash
Thats where the O(1) comes from since searching for a key in a hash slot would take constant time as there is just one key in each slot (due to no collisions)
As the other commentor said the SHA1 collision has such a low probability that theres no need for perfect hashing
Ah, ok that makes a lot of sense. But in the use case of the article I assumed we were under the constraint that output space would be exactly the size of input space as we have a non-sparse file on disk.
So if i understand correctly the variance of the data would not really matter since we will choose a hash table of size of the total space of passwords thus absolutely no collisions will occur no matter which distribution of keys we pick and hash
Thats where the O(1) comes from since searching for a key in a hash slot would take constant time as there is just one key in each slot (due to no collisions)
As the other commentor said the SHA1 collision has such a low probability that theres no need for perfect hashing