AEADs: getting better at symmetric cryptography

tptacek · on May 17, 2015

It took awhile for it to "click" with me why the "repeated ciphertext" condition 'agl talks about in the first example is so bad. I mean, you can see why it's undesirable, but it doesn't seem like a nightmare scenario.

It turns out to be a nightmare scenario in practice. You can use the behavior as a building block in many protocols to get complete plaintext recovery from controlled data ("chosen plaintext"):

http://cryptopals.com/sets/2/challenges/12/

This property is also the building block of Thai Duong and Juliano Rizzo's "BEAST" attack, which targets CBC under a predictable IV condition (note again the distinction between nonces and IVs!).

Also, can't resist emphasizing: Rogaway not only predicted about a decade and a half of blind-alley crypto in IETF protocols, but, when he did so, was essentially run out of town on a rail by people calling him a "supposed cryptographer". Rogaway is, of course, one of the best known academic cryptographers in the world.

Standards Groups + Cryptography = Sadness.

Estragon · on May 18, 2015

Thanks for the cryptopals challenges, I've been enjoying them.

yuhong · on May 18, 2015

I think TLS 1.0 was one of the better ones though. One problem was that it was mostly finished by late 1997, but for other reasons the final RFC got delayed to early 1999.

knweiss · on May 17, 2015

I wonder if there's a good explanation for non-experts that gives at least an idea how an attacker would exploit the (key, nonce) pair reuse and why everything fails so catastrophically?

agl · on May 17, 2015

With both AES-GCM and ChaCha20-Poly1305, confidentiality is provided by XORing the plaintext with a keystream generated by either AES or ChaCha20. If the nonce is the same, then the same keystream is used.

Consider two plaintexts, p₁ and p₂, encrypted with the same (key, nonce) pair. The ciphertexts will, in part, contain p₁⊕k and p₂⊕k, where k is the keystream and ⊕ is XOR.

An attacker can XOR those ciphertexts together and get p₁⊕k⊕p₂⊕k = p₁⊕p₂⊕k⊕k = p₁⊕p₂. If the attacker has any knowledge of p₁ or p₂ then the confidentiality of the other falls as well.

The failure of the authenticator is more complex. Both AES-GCM and ChaCha20-Poly1305 use polynomial authenticators and, in short, duplicating a (key, nonce) pair allows the attacker to solve an equation and that's very bad.

tptacek · on May 17, 2015

I can do you one better:

http://cryptopals.com/sets/1/challenges/6/

http://cryptopals.com/sets/3/challenges/20/

In the literature, nonce reuse on things like GCM can fail even more spectacularly, with respect to to the authenticator's key.

tveita · on May 18, 2015

AES-GCM and ChaCha20-Poly1305 work in roughly the same way. They both use the key and nonce to produce a pseudo-random sequence of bits. A small part of that sequence is used as an authentication key to make the message authentication code using a polynomial function over the ciphertext, the rest is xored together with the message to make the ciphertext.

When a (key, nonce) pair is reused, these constructs fail in two ways:

The first way is the same for all stream ciphers: since the same stream is xored with both messages, the attacker gets to see the xor of the messages together. From this you get a fun puzzle, e.g. assuming English text you might deduce something like 'alphabet neighbours, followed by the same letter, followed by the same letters but different case...', from which you can often deduce parts or whole of the plaintexts by statistical methods.

The second is due to the polynomial structure of the message authentication code. In a secure MAC, an attacker must not be able to deduce the key from seeing the authentication code. But for GCM and Poly1305, this is only true if the attacker doesn't see multiple messages authenticated with the same authentication key. With multiple tags from the same key, an attacker can solve for the authentication key.

Together this lets the attacker know both the keystream and the authentication key, from which they can forge basically any message they want for that key and nonce.

Compare this to e.g. ChaCha20-then-HMAC: a reused nonce would still let the attacker deduce the plaintext, but having two messages signed with the same HMAC key tells you nothing about the key. ChaCha20-Poly1305 is secure when used correctly, but ChaCha20-then-HMAC is more misuse-resistant, since a reused nonce breaks confidentiality but not authenticity.

praseodym · on May 17, 2015

According to RFC 5116 [1], there is 'associated data' (AD) as well as an authentication tag (similar to a MAC). It seems like both can be used to verify the authenticity of the ciphertext, but the article only mentions the AD part. When do you use which?

The JDK8 documentation for the Cipher class has a paragraph on AEAD. This mentions AAD; unencrypted "Additional Associated Data", but also calls the same thing "Additional Authentication Data". Pretty confusing.

[1] http://www.ietf.org/rfc/rfc5116.txt [2] http://docs.oracle.com/javase/8/docs/api/javax/crypto/Cipher...

agl · on May 17, 2015

Any authentication tag is a detail of the AEAD. As a practical matter, in order to provide authentication, the AEAD must expand the plaintext and, in some AEADs, that expansion comes in the form of a tag. But some AEADs just pad the plaintext with zeros and use a variable-block cipher (i.e. AEZ), in which case there's no tag as such.

Either way, it's an internal detail of the AEAD that someone using it doesn't need to know about. The AEAD just needs to signal an error at decryption time if the ciphertext has been manipulated.

The associated data(⁺) is just an input that needs to be equal at encryption and decryption time. It can be empty, or it could be a counter, but it could also be some other form of context, e.g. a string “payload for attachment #3 of message #234982374”. It's there to make sure that ciphertexts are understood in the correct context, but it's not included in the ciphertext itself.

The Java docs (and Java's crypto APIs are terrible in general I'm afraid) call the AD both “associated data” and “additional authentication data”. That's just a mistake. At the very least they should be consistent within themselves and I think they should pick “associated data” as the term to use.

(⁺) I called it “additional” data in my post, but since the RFC calls it “associated” I changed it to that. It's the same thing.

finnn · on May 17, 2015

The "BoringSSL" link in the second paragraph is broken, links back to the same post.

agl · on May 17, 2015

Opps, fixed. Thanks. It should have linked to https://www.imperialviolet.org/2014/06/20/boringssl.html