This is just direct fallout from the way Merkle-Damgard hash functions work. MD hash functions look like this:
function MD(C, h, M):
M <- pad(M)
for m in blocks(M):
h <- C(h, m)
return h
The core of an MD hash is the compression function C, which takes an input state h and a message block m and produces a new state. When you find a collision in an MD hash function, you're really finding a collision in C. That is, some state h and some pair of message blocks m1 and m2 such that C(h, m1) = C(h, m2).
Now, HMAC. HMAC is defined over a hash function, and it looks like this:
function HMAC(H, k, M):
k1 <- ipad ^ k
k2 <- opad ^ k
return H(k2 || H(k1 || M))
Where "||" is the concatenation operator. A simplified view of HMAC is that the key is used to scramble the initial state of H. This prevents the attacker from finding collisions on the inner hash function, because they don't know any interior state h. (The outer hash prevents length-extension attacks.)
Which makes this article pretty silly: the whole point of HMAC is that the attacker doesn't get to know interior hash states. It pretty much goes without saying that if a) an attacker can find collisions on H for arbitrary states h and b) they know some interior state h of the HMAC, then it follows directly that they can find collisions on the HMAC.
Hey, want to see some more practical "attacks" on HMAC? Here we go!
In [11]: HMAC(b'\x00', b'chili dog', sha256).hexdigest()
Out[11]: '3128a12ed7dd6cd4ad9d2720126698d1c0c3e2d991ddbf73ebbecd947581acd9'
In [12]: HMAC(b'\x00\x00', b'chili dog', sha256).hexdigest()
Out[12]: '3128a12ed7dd6cd4ad9d2720126698d1c0c3e2d991ddbf73ebbecd947581acd9'
Whoaaa, what?! Again!
In [17]: HMAC(sha256(b'A'*500).digest(), b'chili dog', sha256).hexdigest()
Out[17]: '426b6ec9b0e6b5fae9d12924432c023e8bc3444aa3adc4c48fdc875357c54c2f'
In [18]: HMAC(b'A'*500, b'chili dog', sha256).hexdigest()
Out[18]: '426b6ec9b0e6b5fae9d12924432c023e8bc3444aa3adc4c48fdc875357c54c2f'
The two collision examples I used both rely on the key expansion step in HMAC, which I glossed over in my explanation. If you go look up the HMAC spec, it should be pretty clear why these work.
Yeah, a message less than blocksize bytes is padded with 0's, and a message greater than blocksize bytes is hashed to blocksize bytes. It "follows" just as much as Little MAC, is about as security relevant, and is no less clever (arguably a bit more).
HMAC keys are exactly one block long. But lets say an app wants to have a key that's two blocks long. No big deal, hash that app key and now two is one.
Ah, but now let's say the attacker has control of input keys. He can provide both the two block key which HMAC will hash for him, or he can do the work himself and prehash the two into one. Now two different input keys seem to provide the same output (assuming identical messages, of course).
You're right. HMAC keys, from the internal hash perspective, are always one block long. So if the key coming into HMAC is less than blocksize, it's zero padded to blocksize. And if the key coming into HMAC is length greater than blocksize, it's hashed down to blocksize. Obviously this does create collisions if the key ends with nulls and is less than 64 bytes, or if the attacker can specify keys.
It's novel in that I wasn't able to find anyone else who had posted a colliding HMAC pair, and the couple papers people have found have sniffed around but hadn't quite gotten to the point of "Oh, yeah, compensate for ipad and collide after the first block".
You're correct that it went without saying in that people did not actually say it.
What exactly should he have cited Solar Designer about? If you're going to criticize someone for not citing, and the citation isn't obvious, you should probably provide a link.
Are you referring to the key expansion property that Scott mentioned downthread? He cited the appropriate source: the HMAC specification. The properties he's talking about are obvious: the key is zero-padded to the block size and, if longer than the block size, hashed.
I stand corrected, there are in fact a decent number of people who mentioned HMAC(K,X)==HMAC(H(K),X) when len(K)>blocksize. So, you had no reason to cite Solar Designer and I shouldn't have implied you should. Sorry about that!
Still interested if you can find an earlier cite for HMAC collision. Most of what I find implies mad crypto cleverness, not this silly little construction.
I think the point was that if some dumb protocol were misusing HMAC (and I'm not good enough at thinking about protocols to imagine how it might do so), it could be vulnerable to collisions generated this way.
I could feasibly see a mathematician understanding all this but not being familiar with that symbol, which is mostly used in programming (math prefers a ≠ b instead of a != b).
No.
EDIT: I've been told to elaborate.
This is just direct fallout from the way Merkle-Damgard hash functions work. MD hash functions look like this:
The core of an MD hash is the compression function C, which takes an input state h and a message block m and produces a new state. When you find a collision in an MD hash function, you're really finding a collision in C. That is, some state h and some pair of message blocks m1 and m2 such that C(h, m1) = C(h, m2).Now, HMAC. HMAC is defined over a hash function, and it looks like this:
Where "||" is the concatenation operator. A simplified view of HMAC is that the key is used to scramble the initial state of H. This prevents the attacker from finding collisions on the inner hash function, because they don't know any interior state h. (The outer hash prevents length-extension attacks.)Which makes this article pretty silly: the whole point of HMAC is that the attacker doesn't get to know interior hash states. It pretty much goes without saying that if a) an attacker can find collisions on H for arbitrary states h and b) they know some interior state h of the HMAC, then it follows directly that they can find collisions on the HMAC.
Hey, want to see some more practical "attacks" on HMAC? Here we go!
Whoaaa, what?! Again! Oh nooo! HMAC!