Hacker News new | past | comments | ask | show | jobs | submit login
CREAM: the SSL attack you’ve probably never heard of (tonyarcieri.com)
90 points by bascule on Nov 12, 2014 | hide | past | favorite | 48 comments



This attack is conceptually simple, and the linked paper (http://cr.yp.to/antiforgery/cachetiming-20050414.pdf) is very approachable for the layman. I recommend giving it a read.

Here's the basic idea:

  1. The first thing the AES encrypt function does is XOR user input against the secret key.
  2. The result of this operation is used to index into a lookup table (the S-box).
  3. The timing characteristics of step 2 will vary based on the contents of the cache.
By measuring the timings for a large number of random inputs and correlating them by index, (e.g. sort all messages into buckets based on the value of the first byte), we can recover the corresponding byte of the secret key.

Though it's not clear a priori what the timing characteristics should look like for a particular key byte, we can easily measure this on a machine we control that matches the description of the target platform.


Unsurprisingly excellent description. How much control do we have over the cache lines populated by AES in a TLS scenario? We can't fully randomize the AES inputs, because they have to correspond to the TLS protocol.


I think it would be tough to attack TLS with this.

A couple important stumbling blocks off the top of my head:

We need to pull off the attack in the context of a single TLS session, because new AES keys are generated for each one. This means we can't choose any inputs to the AES function. If we try to manipulate or forge a message, the MAC will fail and then the jig is up. This might not be a big deal, since the AES ciphertexts will seem random anyway.

There will be a large amount of overhead in each request compared to Bernstein's original experiment. This will likely introduce a large amount of per-request jitter. With enough requests we can iron this out, but again: one session. So there may be an upper limit on the number of samples we can realistically take.

We may have a tough time comparing like to like. Certainly we can request the same static resource each time if we control malicious JavaScript, but there may be confounding factors. This might not be a big deal, since I think this just amounts to more jitter.


I know this is naive as hell, but would adding some random sleeps before returning encrypted bits make it better or worse for this particular attack?


That kind of "blinding" is entirely ineffective, I'm afraid. It just averages right out.


If "sleep" means "sleep until the next timer tick", how can it average out? Especially if the timer is started at the start of the encryption, and all encryptions (at least, for one block) take less time than the timer is set for. That means all times get set to exactly the timer time, and no information reaches the attacker.

What am I missing?


"Random sleep" does not in general mean "sleep until the next timer tick". The best fix is making the function constant time, if you can achieve this with a sleep that makes the operation always take exactly one quantum then the sleep is really an implementation detail and quite far from "random sleep".


I've realized this (and am certainly not alone in that, it's rather obvious, when one think about the nature of timing) -- but does anyone have some links on implementing this? Are there some (sequence of) x86_64 instructions that can be used to bound a procedure (in the pascal sense of the word) to a quantum, regardless of things like when the procedure is called (assuming the procedure is short enough, and depending on branching behaviour, I suppose instruction fetch/decode, data fetch/write and accompanying cache hit/miss can make it hard to a) select a worst-run time in terms of clocks cycles to target (and if so, for which concrete cpus) and b) be hard to make sure the cpu is actually busy for exactly that many cycles...)? Is this even possible to approach in this portable C99?

I suppose if one ignore the information leak due to possible change in cpu load, one might device a kind of evented "call back" model, where one wait to return the result of a procedure until an interrupt is triggered?

I don't expect a full answer, but if anyone has a link to some source code that isn't too complicated, I'd be very happy (either "real-world" or some good "example" code).


The most common way to avoid timing side channel attacks is to write the procedure in C or ASM in such a way that there are no data-dependent differences in execution path. You've probably seen e.g. the memcmp that doesn't exit early. This attack is a little different in that it's not that different instructions are executed, it's that different memory access patterns take different amounts of time. For that, you can maybe change the implementation to not have any data-dependent array accesses, or maybe you can do things with prefetching to make the memory accesses constant time.

An approach where you watch the clock will be inherently less portable and actually much harder. Not only will the timing calls be hardware or OS specific, but so will the worst-case time. Imagine having to deal with a chip going into low power mode during your computation. Also you probably don't want to count time that your thread wasn't scheduled to run, so now you're talking about integrating with the scheduler.


Ah, I missed that xiata said "random". My error.


You always want to remove secret data, not obfuscate it. Think of the difference between those reversible pixel mosaics and a flat black censor bar, for example.

Incidentally, many 'normal' blur functions implemented in software are somewhat reversible.


It would mitigate it slightly, but ultimately will be ineffectual - they can simple take more measurements.


So what hearing is:

- AES is vulnerable to timing attacks by design unless the implementation is very careful. But even then no promise it is safe because of CPU hardware nuances between models and architectures.

- Use AES-GCM if you must use AES.

- AES-GCM(?) is accelerated by the Intel AES extension and provides a constant-time implementation, but you have to trust Intel's hardware isn't backdoored.

- Don't use AES for network communications if you can avoid it.

Is this correct?


I think worrying about a backdoor is mis-aiming paranoia. If the hardware were backdoored, it could do malicious things to you whether or not you call PCLMULQDQ.

What you should be worried about is something much more common: you have to trust that Intel's hardware is correctly implemented. Intel does a fantastic job at an extremely hard problem, but they still make the occasional mistake: http://www.anandtech.com/show/8376/intel-disables-tsx-instru...

The AES-NI and CLMUL instructions promise two things: first, that they'll do the math correctly, and second, that they'll do it in constant time with respect to secret data. I don't think anything else nontrivial in the architecture makes that second promise.

(On a minor technical point: AES itself is constant-time if you have AES-NI, which is moderately more common than CLMUL. It's just that, cryptographically, almost all new users of AES today should be using AES-GCM, and that involves a not-easily-constant-time operation separate from the AES block cipher itself.)


AES-NI can be used to accelerate AES on Intel CPUs. CLMUL instructions (namely PCLMULQDQ) can be used to accelerate GCM. You need to use both in conjunction for a fast, constant-time AES implementation.

However, the conclusion to draw is more general: any data dependent timings in any cipher implementation can be catastrophic.


1 https://www.iacr.org/search/?q=AES+Cipher+Keys

2 prediction: 3 side-channel attack + boomerang attack 10 round >> AES-128 Broken

4 gap = goal(10 round) - present (7 round) = future (3 round) 5 gap/time = estimate at 1 year 6 software goal = http://hashcat.net/oclhashcat/

7 prediction method: 8 curl, elinks, word cluster, topic summaries 9 : 10 AES Cipher Keys Suitable for Efficient Side-Channel 11 Vulnerability Evaluation 12 Takaaki Mizuki, Yu-ichi Hayashi Tohoku University

13 Abstract 14 This paper investigates pairs of AES-128 cipher keys and 15 plaintexts which result in being “quiet” in the final round, 16 i.e., whose 128-bit State holds the same bit pattern 17 before and after Round 10. 18 HOLDS THE SAME BIT PATTERN BEFORE AND AFTER ROUND 10.

19 ...Because such quiet and noisy plaintexts make extreme actions 20 in the final round of the AES encryption, these AES-128 cipher keys 21 are quite useful for AES hardware designers to efficiently 22 evaluate the vulnerabilities of their products, 23 for instance, the performance of their side-channel attack 24 countermeasures. 25 Table 8: The Estimated SNR at various distances from the FPGA.

26 9 Conclusion 27 recent research has been adopting the idea that suitable plaintexts, 28 say those with rather small (but non-zero) Hamming distances 29 in the final round, are chosen for efficient side-channel attack evaluation. 30 there has not been any study that uses quiet plaintexts 31 (i.e., those with exactly-zero-Hamming distance) yet.

32 New Related-Key Boomerang Attacks on AES 33 Michael Gorski and Stefan Lucks, Bauhaus-University Weimar, Germany

34 Abstract. In this paper we present two new attacks on round reduced 35 versions of the AES. We present the first application of the related-key 36 boomerang attack on 7 and 9 rounds of AES-192. The 7-round attack 37 requires only 218 chosen plaintexts and ciphertexts and needs 38 267.5 encryptions. We extend our attack to nine rounds of AES-192. 39 This leaves to a data complexity of 267 chosen plaintexts and ciphertexts 40 using about 2143.33 encryptions to break 9 rounds of AES-192.

41 Conclusion 42 The AES remains still unbroken but we have shown that up to 43 7 rounds practical attacks are available yet. 44 7 ROUNDS PRACTICAL ATTACKS ARE AVAILABLE


>> - Use AES-GCM if you must use AES

Without CLMUL instructions, GCM is either very slow or potentially sensitive to cache-based side-channel attacks (regardless of AES being the underlying cipher).

The EAX mode of operation is more secure - though not very popular for some reasons and not part of any TLS cipher suite.


Given Rogaway's recent patent grants, OCB would be a more straightforward improvement over GCM.


Does anything use OCB? It seems GCM "won".

The patents may have killed it. Even "free for open source" can be troublesome if you're worried about putting free software into an appliance and getting trapped. Easier to avoid patented algos entirely.


Rogaway's patent grants are now very liberal: they cover all open source and everything non-military. It passed through CFRG, too, and is documented in an RFC: https://tools.ietf.org/html/rfc7253

But yes, definitely the fact that there were patent grants hurt it a lot in adoption before; even when (as in WiFi) it was one of the contenders, CCM is more common.

A few things do use it: off the top of my head, I think Mumble does, although I think that's an earlier variant (OCB2, perhaps, rather than OCB3 as documented in the RFC?).

I'm also looking forward to the results of the CAESAR authenticated-encryption competition - http://competitions.cr.yp.to/caesar.html - there's a lot of competition, and quite a few entries fell and have been withdrawn. The current version of OCB is among the current list of contenders, among several other interesting candidates.


Specifically regarding the patent grants (of which there are three on http://web.cs.ucdavis.edu/~rogaway/ocb/license.htm: open source, non-military, and OpenSSL) they would appear at first glance to cover OpenBSD. All three in fact. The problem is that this then creates a trap for anyone taking OpenBSD and using it to build something to sell to a military. Suddenly they are no longer protected; we prefer not to incorporate anything that can create such traps.

For an example, this came up in the thread where OCB was proposed to be added to OpenSSL. You think you're free and clear, and then you're not. http://marc.info/?l=openssl-dev&m=136016226304441&w=2

Then came the OpenSSL specific license. That license probably applies to LibreSSL today, but now there's a Ship of Theseus problem. How much OpenSSL does one need to keep to qualify? And of course, the OpenBSD IPsec stack is completely unrelated to OpenSSL.


> Specifically regarding the patent grants (of which there are three on http://web.cs.ucdavis.edu/~rogaway/ocb/license.htm: open source, non-military, and OpenSSL) they would appear at first glance to cover OpenBSD. All three in fact. The problem is that this then creates a trap for anyone taking OpenBSD and using it to build something to sell to a military. Suddenly they are no longer protected; we prefer not to incorporate anything that can create such traps.

How so? As far as I can tell, the "open source" grant, covers everything under a BSD license (among other licenses) -- and holds no provision for "military use". I don't see how anyone using the [ed: algorithm, not code] under license 1, could become subject to license 3?


People take OpenBSD and turn it into not open source products all the time. For a more famous example, FreeBSD is at the core of the Playstation OS, but it's no longer open source.


Mosh (https://mosh.mit.edu) uses OCB. At the time it was written, GCM implementation availability was poor: OpenSSL was just adding it upstream, so almost no end-user machines would have it for some time. Meanwhile, the OCB reference implementation looked fine, and permitted the Mosh developers not to write their own crypto.

The OCB patent grant at the time specifically required GPL, which I believe played into the decision to release Mosh under the GPL. (It now has a very wide patent grant, permitting a few options including all non-military software use, both closed and open, but GCM was already winning by then.)


EAX is protected by patents.


Could you provide a reference?


For this attack to get a stupid name, it needs to actually attack SSL/TLS. But Bernstein got it to work only in a lab setting, with a target pessimized to expose the vulnerability. That target, of course, did not use SSL/TLS.

Pedantry Points += 10


Normally I'd agree the way an attack makes the leap from an academic paper to a cute name is a real-world PoC||GTFO, but c'mon, is this not the perfect name?


I think security researchers are getting a bit caught up with the glam and the glory of huge blockbuster bugs.

  Cash rules everything around me
  CREAM, get the money
  Dollar, dollar bill y'all


I'm really wary of the "don't invent your own crypto" mantra, so I want to venture an idea I had here, rather than writing any code for it:

I understand the problems with adding a random delay to try to add "noise" to the measurements, but what if the delay was non-random? Specifically, what if the delay was calculated to make the whole operation always take constant time?

Example:

User input comes in on thread A. Thread A sends the request to thread B and delays for time t. Thread B does the encryption and sets the result. Thread A resumes and returns the result.

If we pick 't' such that it is always larger than the amount of time it takes to do the operation, any timing differences observed by the attacker won't be correlated with the timing of the cryptographic operations. I think.

(NB: it could be something other than threads, such as "microservice B" instead of "thread B", that particular detail isn't important)


A better strategy (and an area of research for Dan Bernstein, author of the referenced paper) is to design crypto that doesn't leak this kind of side-channel information. See, for example, his Salsa20 family of stream ciphers.


I know that is a better strategy, but it's also a much much harder strategy :) Even a well designed cipher could end up leaking info in practice due to compiler quirks or implementation mistakes.


People like their crypto to run faster than "maximally pessimized".


But they also like their crypto secure :P


The best way to fix this is to eliminate the S-box lookup, thereby making AES constant-time.

For example: http://rd.springer.com/chapter/10.1007%2F11935070_14


> I understand the problems with adding a random delay to try to add "noise" to the measurements

Actually, this isn't problematic at all... it's called random blinding: https://en.wikipedia.org/wiki/Blinding_(cryptography)

It's a commonly used method used to implement things like RSA in constant time.


I think he's talking about something like:

  c = encrypt(k, m)
  sleep(rand())
  return c
Which is not what blinding is. Blinding is not really about adding a random delay as much as it is about preventing an attacker from controlling inputs to variable-time functions.

Also, it's not clear how blinding could be applied to AES.


Yes, okay, however my point was we don't actually need constant-time operation if the timing is uniformly random. Clearly that doesn't mean we introduce a random sleep, but it means we carry out computations in a way that timings aren't data-dependent and an attacker can't separate signal from noise.


I guess I'm still unclear how that would work in the context of AES. Or really any symmetric construction, come to think of it.


It isn't used for symmetric encryption algorithms. They're more easily made verifiably constant time as described in the blog post.

It's a common way to protect something like RSA from timing sidechannels, however.


Constant time is the defense recommended in the article :)


Many SSL attacks seem to require thousands or millions of interactive sessions or inputs. Is there a reason we aren't modifying our Internet-facing servers to drop connections and discard ephemeral keys when a particular IP or set of IPs performs actions that are outside the norm?


Well I think the crypto nerds would like to design crypto systems that are inherently (i.e. mathematically) resistant to such attacks.

But in general, I think you are right. I am astounded at how few networked applications perform rate limiting. Wordpress, for example, does not ship with any rate limiting on the login form. Brute force? Go ahead, give it a shot.

By comparison, Drupal 7 out of the box limits any IP to a small number of quick login attempts before blocking that IP temporarily.

If your application is intended for human interaction, it just makes sense to limit things to human speed. Maybe it's harder than I think it is, or maybe people just don't think of it.


It's a tradeoff (like so much in security): limit an IP to N number of quick login attempts, and it's easy for your students to DOS the Drupal-powered school portal (assuming the school is behind a NAT, at least). Often you want more security, and less convenience ... but it's not as easy as "most secure all the time!".


Because then you end up blocking entire countries by accident.


from the blog post:

> Fortunately, Intel solved this problem… for the hyperspecific case of AES. Newer Intel CPUs (and also other vendors including ARM) now provide a fast, constant-time implementation of AES in hardware.

Others have pointed to particular aspects of Intel hardware that they don't believe are 'backdoored', for some definition of backdoored. One point some of these comments appear to be missing is that using things like AES-NI typically also means using things like RDRAND.

Whether you think there's any relationship between Intel's Bull Mountain and NSA'S BULLRUN is up to you, but at the end of the day, the only way any of us can know for sure whether RDRAND uses DUAL_EC_DRBG is to:

1) decide to take Intel at their word about RDRAND using only CTR_DRBG, or

2) undertake some difficult and probably-expensive hardware forensics to find out


DUAL_EC_DRBG would have been the dumbest possible way to backdoor RDRAND. It would also imply that Intel is somehow able to slip in circuitry able to do two 256-bit elliptic curve scalar multiplications in under 300 cycles, and pretend it is AES circuitry which would normally be orders of magnitude smaller and faster.


CREAM? We Wu-Tang now?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: