Lessons from the history of attacks on secure hash functions (2019)

userbinator · on Feb 7, 2020

I'm no expert but I did a bunch of reading around the time the MD5 collisions were found, and have what I think is a semi-intuitive notion of why preimage resistance is much easier than collision resistance. A hash function has an internal state that is altered by its input, and so each of the bits of its output is essentially a very large Boolean expression of all its input bits. A pre-image attack means that you're given the outputs and need to solve for the inputs, while a collision means setting two sets of these equations together and finding a differing set of inputs which will "cancel out" internally and result in the same output. The latter is easier, because you're not constrained by the outputs --- the final hash output can be anything, as long as you find two different inputs that give the same output.

This very interesting article about using "tunnels" to find collisions in MD5 is worth reading: http://eprint.iacr.org/2006/105.pdf

deerIRL · on Feb 6, 2020

Did anyone else find the tables completely unreadable? Embedding them in a scrolling iFrame without anchoring the headers to keep half the screen whitespace just seems insane to me.

abathur · on Feb 6, 2020

They could certainly be easier to use. Sticky left/top rows might help. A modal might help.

An interactive, filterable visualization for the timelines might help, too. The timeline could be greatly condensed if the exact change points were marked for a non-confusing number of functions.

Or perhaps a cheeky play on Minard's graph of Napoleon's Russia campaign, but with factors like the cost of compute instead of the temperature. Or on Munroe's flow graph of U.S. congress composition/ideology.

wolfgke · on Feb 6, 2020

Use Shift + 'scroll wheel' when there is a (mouse) focus on the scrolling iframe.

kerkeslager · on Feb 7, 2020

Given only the explanation, I think I'd have a hard time figuring out how collision resistance differs from second-preimage resistance. After all, if you generate a collision, you have two inputs (preimages) that hash to the same output (image) so isn't the second input a second pre-image of the first input?

I'm not an expert, but I'll try to explain how I think this works, and invite real experts to step in if I get it wrong.

I think the way this works is that, with collisions, the attacker gets to choose their inputs. They don't care what two inputs they find that generate to the same output, they only care that they do generate to the same output. With a second-preimage attack, one of the inputs is chosen by the hasher.

As it turns out, the ability to choose both your inputs is quite useful! You can view hash functions in reverse: for a given hash function, the inverse hash function is a function which takes the output of the hash function and returns the set of inputs to the hash function. For many hash functions, the inverse hash function IS NOT uniformly difficult to compute partial solutions to: some inverse hashes are much easier to compute than others. So if you can find an output for which the inverse hash is easy to partially compute, you can generate collisions at that point much more easily.

However, in the space of possible hash outputs, the number for which the inverse hash is easier to compute is probably very small, or the possibility of computing the inverse hash would have been noticed by the creators of the hash function. So it's highly improbable that the hasher will have chosen a hash input at a point where the hash input is one where the inverse hash is easy to compute.

captaincrowbar · on Feb 7, 2020

Collision attack: Find two different messages with the same hash.

[First] pre-image attack: Given a hash value, find a message with that hash.

Second pre-image attack: Given a message, find a different message with the same hash.

pabs3 · on Feb 7, 2020

Another post on the lifetime of hash functions:

http://valerieaurora.org/hash.html

dependenttypes · on Feb 7, 2020

> all others (RSA, DSA, ECDSA, Ed25519, etc.)

This table is incorrect. Ed25519 was explicitly designed with the intention of only needing preimage resistance, not collision resistance.

erwan · on Feb 7, 2020

The table is correct. First because design intent is irrelevant. Second because the author clearly states that the focus of his comparative study is on *-preimage resistance vs collision. There is more to the article than the table :)

dependenttypes · on Feb 8, 2020

I really do not understand what you are trying to say. Just like SPHINCS, Ed25519 only cares that the hash function used with it has *-preimage resistance and it would not be broken if a collision was found in it. See https://ed25519.cr.yp.to/ed25519-20110926.pdf

> There is more to the article than the table :)

I am aware but my issue was with the table...

willvarfar · on Feb 6, 2020

At the bottom the article mentions that the NSA said there was something wrong with SHA-0. Do we now know what that was?

bannable · on Feb 6, 2020

Boomerang attacks on SHA-0 have a complexity on the order of 2^33.

willvarfar · on Feb 7, 2020

The boomerang (against block ciphers) was coined in 1999. It was used against SHA-0 in 2008?

Apparently the NSA warned against SHA-0 in 1995.

So do we now know what they perceived the weakness to be? Or are we still guessing?