SHA1 collisions down to 2^52 [pdf]

tptacek · on April 30, 2009

Quick context:

A reasonable attack against SHA1 is cause for alarm. The "next step up" is SHA256, which is a function that designers actively avoid because it's expensive. Virtually everything that used MD5 had a trivial upgrade path to SHA1.

Most SSL certificates are signed with SHA1.

Interestingly, the SSL protocol itself uses both MD5 and SHA1, but is not vulnerable to these attacks, because Paul Kocher got them to use both MD5 and SHA1 (on interleaved portions of the handshake).

ars · on April 30, 2009

Using both MD5 and SHA1 helps very little. If you can find a collision in one, you can find a simultaneous collision for the other as well. (Since both are semi-broken.)

Basically if you use two you are as strong as the strongest one - but NOT stronger.

So it was a gamble of sorts on which one (MD5 or SHA1) would break first. But if both break (as has semi-happened) it doesn't help at all.

Daniel_Newby · on April 30, 2009

That assumes that the attack can find a collision for a chosen hash value, not merely two inputs that collide to some random value. Are the attacks that strong?

I am amused that the URL for a hash-is-broken paper is based on the file's MD5 hash.

ars · on April 30, 2009

What is it about hashes? Every single one gets broken eventually.

Encryption doesn't seem to suffer from this - the keyspace might be small enough to search, but they don't get broken.

2^52 isn't small, but it's a lot less than the expected 2^80.

cperciva · on April 30, 2009

Encryption doesn't seem to suffer from this - the keyspace might be small enough to search, but they don't get broken.

DES is vulnerable to linear cryptanalysis (2^43 plaintexts). RC2 is vulnerable to a related-key attack (2^34 chosen plaintexts). RC4 is broken in too many ways to list. Khufu and Khafre are subject to differential cryptanalysis.

tptacek · on April 30, 2009

DES, RC2, RC4, Khufu, Khafre. Now can we get an analysis for something that isn't in the "don't ever use" list in Practical Cryptography?

Because it is the case that hashes are less well studied (and thus possibly less "hardened") than our workhorse block ciphers.

You're the expert here, can we wheedle you into giving us the real answer?

cperciva · on April 30, 2009

DES, RC2, RC4, Khufu, Khafre. Now can we get an analysis for something that isn't in the "don't ever use" list in Practical Cryptography?

I wasn't aware there was such a list... but if those were on a "don't ever use" list, I'd expect MD5 and SHA1 to be on the same list.

Because it is the case that hashes are less well studied (and thus possibly less "hardened") than our workhorse block ciphers.

Historically speaking, yes. If you stop a random man on the street and ask him what cryptography it, he's far more likely to say "keeping secrets" than "unforgeable authentication". Cryptographers have known for a long time that authentication is important, but because of the public bias towards secrecy, encryption has tended to receive more attention than authentication.

I don't think this is the case any more, however -- with these recent attacks I'd say that the state of the art of attacks on hashes is roughly on par with the state of the art of attacks on block ciphers.

evgen · on April 30, 2009

It is also worth noting that historically hash functions have had the implicit requirement of being "fast" and "small". Because hashes are used on both encrypted and unencrypted data there is a pressure on hash designers and the people selecting from available hash options to factor speed and output hash size into the selection criteria. This inevitably leads to the selection of hash functions that compromise security for other (practical) considerations.

We could have moved from MD5 to Whirlpool a long time ago, but I and everyone else looked at doubling storage requirements and chopping throughput to a trickle and selected SHA1...

gms · on April 30, 2009

It's not that. People just know so little about designing good crypto hash functions. Basically right now nobody knows what they're doing.

cperciva · on April 30, 2009

Basically right now nobody knows what they're doing.

That's going a bit far. We know how to design hashes which are provably at least as strong as block ciphers (i.e., "if you can break this hash, you can break AES") -- but synthetic constructions like those always end up being much slower than constructions which are "native" hashes.

tptacek · on April 30, 2009

I don't know why someone would mod this comment down. It paraphrases pretty accurately points made by professional cryptographers.

Percival's response is also accurate. He's saying, there's a baseline for secure hash functions: those that adapt block cipher constructions. But that's just the baseline. Nobody does that, because hash functions need to be fast. And making that tradeoff of speed versus proven constructions seems to put us into the weeds.

ars · on April 30, 2009

So is that my answer to the question that started this thread? We know how to make good hashes, but they are slow. And all the fast hashes get broken eventually?

cperciva · on April 30, 2009

And all the fast hashes get broken eventually?

I'd phrase it as "because hashes have historically attracted less attention than ciphers, it took longer for us to figure out how much of a margin of error to design into hashes".

For both block ciphers and hashes, the vast majority of designs rely on iterated mixing functions, and long before the complete designs are broken cryptographers figure out how to attack reduced-round versions. There's nothing inherently broken about the designs behind either MD5 or SHA1 -- if you created an MD5++ hash which exactly followed the design of MD5 but used twice as many mixing rounds, I don't think anyone would be able to find collisions for it.

Barnabas · on April 30, 2009

2 ^ 52 = 4,503,599,627,370,500

If that equals a number of seconds, then it is equal to 142,808,207 years. If kilometers, it is equal to 476.03 light years.

So 2 ^ 52 is still a big number anyway.

cperciva · on April 30, 2009

If that equals a number of seconds, then it is equal to 142,808,207 years.

Yes, but it isn't a number of seconds. It's a number of cryptographic operations. My laptop can perform 2^23 SHA1s per second. SHA1 collision-finding is getting into the "idle resources from a university computing lab for a few months" range.

jmatt · on April 30, 2009

"idle resources from a university computing lab for a few months" range.

Out of curiosity, what sort of time frame does that translate to for a government agency such as the NSA?

cperciva · on April 30, 2009

Out of curiosity, what sort of time frame does that translate to for a government agency such as the NSA?

Less than zero seconds. I'm sure the NSA computed collisions in SHA1 at least two years ago -- they had more than enough resources and they'd have to be idiots to not be interested in investigating the problem.

durin42 · on April 30, 2009

ISTR that the "rule of thumb" is that the NSA is a couple of years ahead of the mainstream for both Mathematics and computer tech. I don't know how valid that is, but it probably is a decent baseline for paranoia.

cperciva · on April 30, 2009

ISTR that the "rule of thumb" is that the NSA is a couple of years ahead of the mainstream for both Mathematics and computer tech.

For cryptographic research, that's probably true. For "computer technology" in general, probably not -- they have a large budget, but not so large that it would make sense for them to independently replicate all of the work companies like Intel and IBM do on lithographic processes.

I'm sure the NSA has access to fabrication capacity, though -- almost certainly via deals with US companies to allow them to run a few wafers here and there.

eru · on April 30, 2009

Interesting. How do these estimates square with the perceived advantages of open source development over closed source in the realm of software?

The NSA has more money, but so do many commercial software companies.

Does the NSA have other advantages? Is cryptography so different from software? Or does the NSA just fool people about their level of sophistication but relies on Rubber-hose cryptanalysis and eaves-dropping on telephone calls to get their result?

defen · on April 30, 2009

They already know what you were hashing so it doesn't matter.

dfranke · on April 30, 2009

For the Storm botnet it'd be on the order of one hour.

Eliezer · on April 30, 2009

And if you do a billion operations per second, it's equal to .142808207 years.