> I think you're confusing proving that the hash function is collision resistant...

Dylan16807 · on June 8, 2023

> The idea is that `Pr[h(x) = h(y)]` is small _no matter the inputs x and y_.

That sounds like such a function is strongly collision resistant, which means it's also second preimage resistant. And that gets you most of the way to a cryptographic hash function.

Is the only difference that it doesn't have to be first preimage resistant? Compared to cryptographic hashes, does that expand the set of viable functions a lot, to allow first preimages while still not allowing second preimages?

> It is all about probabilistic guarantees

So are cryptographic hash functions.

When I search for `algorithmic probabilistic hash functions` I just get results about bloom filters.

thomasahle · on June 8, 2023

> > It is all about probabilistic guarantees

> So are cryptographic hash functions.

Cryptographic hash functions like MD5, SHA-2, BLAKE2, etc are deterministic functions, so it doesn't really make sense to talk about Pr[h(x)=h(y)]. Either the collide or not.

It's muddied a bit by the fact that cryptographers also use universal hashing (or probabilistic hashing, or what I called algorithmic hashing) for stuff like UMACs, https://en.m.wikipedia.org/wiki/UMAC#NH_and_the_RFC_UMAC , but they often have a lot of extra considerations on top of just collision resistance.

Some algorithms also need stronger probabilistic guarantees than just collision resistance (see e.g. https://en.m.wikipedia.org/wiki/K-independent_hashing#Indepe... ). These properties are usually too hard to test for with an experimental testing suite like SMhasher, but if your hash function don't have them, people will be able to find inputs that break your algorthm.

Dylan16807 · on June 8, 2023

> Cryptographic hash functions like MD5, SHA-2, BLAKE2, etc are deterministic functions, so it doesn't really make sense to talk about Pr[h(x)=h(y)]. Either the collide or not.

Eh, that's how I usually see collision resistance described. The probability is based on generating fresh inputs with any method you want/the most effective attack method available.

But I wouldn't say the hash you linked is nondeterministic just because it has a seed. You can seed MD5, SHA-2, and BLAKE2 by tossing bytes in as a prefix. It'll prevent the same attacks and you can give it the same analysis.

So I'm still not sure in what sense a hash like this is facing different requirements than a cryptographic hash.

thomasahle · on June 8, 2023

> You can seed MD5, SHA-2, and BLAKE2 by tossing bytes in as a prefix. It'll prevent the same attacks and you can give it the same analysis.

I'm curious if you can link to such an analysis. These functions are notoriously much harder to analyze than simple functions like "h(x) = ax+b mod p" which is all you need for the probabilistic guarantee.

But even if you could analyze this, you would just end up with a universal hash function that's way slower than you need, because you didn't pick the right tool for the job.

Dylan16807 · on June 8, 2023

By definition, if they're secure then they should meet the requirements, right?

> But even if you could analyze this, you would just end up with a universal hash function that's way slower than you need, because you didn't pick the right tool for the job.

I understand that, I'm just trying to figure out how a universal hash is easier to construct. But as you've gone through the descriptions here I think I understand how the collision resistance necessary is much much simpler, and there seems to be an assumption that the output of the hash will not be available to the attacker.

versteegen · on June 9, 2023

> But I wouldn't say the hash you linked is nondeterministic just because it has a seed. You can seed MD5, SHA-2, and BLAKE2 by tossing bytes in as a prefix.

Yes, but the point is that hash functions used for hash tables are much, much faster than these cryptographic ones.