Hacker News new | past | comments | ask | show | jobs | submit login
Flaw found in online encryption method (nytimes.com)
104 points by sew on Feb 14, 2012 | hide | past | favorite | 34 comments



This is not about some new vulnerability. It is a survey of collected public keys. The main security-relevant conclusion is that some key generators are not using enough entropy.

Dr. Lenstra is a co-author of the paper (so it shouldn't be dismissed on the weakness of the reporting). The NY Times and Markoff should both be ashamed for publishing such a misleading article, particularly the title.


This is not about some new vulnerability.

A more accurate statement is: the thesis that "when hundreds and thousands of RSA keys are generated, the entropy of each key (given the remaining keys) is still large" is shown (in this paper) to be empirically false because people in practice use weak pseudorandom generators.

This is important because when working with the security of RSA, we assume that N=pq for random looking primes p and q. In isolation this is true, but when there are tons of other keys out there, p and q no longer are random.

I also fail to understand why you mention Lenstra being a co-author of the paper.


I take him to mean, "it is understandable that the NY Times would write up a story on RSA vulnerabilities when one of the authors is Arjen Lenstra, because Arjen Lenstra is a giant in the field".

I agree with you (and not the parent) about the importance of the study.


I really don't think it is about not using enough entropy. I think that someone took a shortcut and reused private keys without realizing that they were compromising their public keys by doing so.


No, the owners of the vulnerable keys are by all appearances unrelated.


Huh, I did my back of the envelope wrong when I first decided that wasn't plausible.

Here is a better back of the envelope.

Suppose that we're using a random number generator that has 4 billion possible internal states. (We're using a C integer under the hood.) Suppose that we're having it generate random 256 bit integers, which we then test for primality. The probability of these integers being prime is about 1/log(2256), or about 1/177. So we have around 2.4 million possible prime numbers we can generate from this generator.

A key has 2 of these primes, therefore 2 random keys have 4 possible combinations of their primes that could match, for about 1 chance in 600k of matching. Given n keys, there are roughly n*n/2 possible pairs of keys which might match. It turns out that if you have about 570,000 keys in your overall pool which was made by the same bad algorithm, you'd expect to have roughly 27,000 possible matches.

This is scary. What is even more scary is that if you can identify the bad algorithm in common in this pool of poor choices, it is quite doable to enumerate all possible primes that could be a key. And then rather than just being able to compromise 0.2% of the published keys, you can relatively easily compromise about 4% of them instead.

If they haven't already done so, I would expect interested parties to identify the bad algorithms in widespread use, and enumerate likely to be used primes. The NSA is obvious, but criminal parties with access to a botnet can also perform this operation.


This would be an interesting time to work at Google, because one assumes that an interested researcher there could get his hands on a Very Big Pile of Keys fairly quickly, and this seems to be made for map/reduce.


"The New York Times and Markhoff should both be ashamed for publishing such a misleading article, particularly the title"

Yeah, it's definitely misleading but that's the news these days. They have an audience and it's pretty broad. So yeah, to us this is misleading but when you take the audience of the Technology section of the NYT as a whole you really can't expect them to understand anything more complex than the explanation they gave. They're obviously writing for the lowest common denominator here. It sucks but this is nothing new for any publication when it comes to reporting on subjects few people really have a deep understanding of. I don't think this is such a terribly misleading article on the whole. Had this been published in a cryptography journal or some really specialized publication then I'd see where shame would come into play.


This seems similar to the length-extension property of Merkle Damgaard hashes like MD5 and SHA-2, in that it is something bad that happens in the real world with RSA that isn't a flaw in RSA per se, but is an attack that would ideally be foreclosed on by the algorithm.

In other words, all things being equal, you'd want to select the algorithm where the fewest number of implementation flaws could prove devastating in the future. The ability to conduct large-sample GCD collision surveys and the fact that common CSPRNGs mean that survey generates results is an argument against using RSA.

I don't see anything specific to SSL or X.509 here; this seems like an issue with every PKI. If you publish public keys to the whole world, people can survey them.


So out of 6.4M RSA keys they found 12K where for the modulus n = pq, p or q (or both) were used for creating modulus for other key.

So when I give them my public key, they can just scan their database and with chance 0.2% find a key with a modulo that is divisible by mine p or q? Is that how it works?

edit: The K9 business looks very dodgy. When analysing the keys they found "9 primes each of whose 36 pairwise products appears as n". So next time you're factoring an RSA key your best bet is to start with primes from this K9...


As far as I can tell, the researchers are describing features of particular Key Pairs that have been generated by Users of the RSA algorithm. The 'garbage-in, garbage-out' rule applies. If your RNG sucks, you're going to get weak keys out of it.

Is it harder to generate weak Diffie Hellman parameters using a bad RNG?

Update: Yes, apparently. From the paper: "Cryptosystems such as RSA that require, during key-setup, multiple secrets are more affected by the apparent difficulty to generate proper random values than systems such as Diffie- Hellman (cf. [8]), ElGamal, and (EC)DSA that require a single secret."

That's (mercifully) pretty intuitive.


As I posted to the other thread:

Somewhat disappointing, but not surprising that they did not describe their actual calculation technique for finding the connected primes. It strikes me that there isn't a good excuse for having bad random numbers with the computation power we now have. A simple bitwise addition of various sources of supposedly random numbers is strictly stronger than any subset of its components. So we can combine lots of questionable strong sources and get a very strong source.

Generating the primes locally with insufficient independence seems more understandable than that the same primes are used by independent parties. That hints at a more severe problem with random number generation. Regardless, assuming this is confirmed, it is a significant problem that needs to be hunted down.


They didn't describe the technique, but the fact that the paper has the keyword "Euclidean algorithm" makes it pretty obvious--they just gathered a list of public keys, took the gcd of all the pairs, and whenever they found a gcd not equal to one, they'd cracked both keys out of that particular pair. See my earlier comment.


I took the 'the straightforward approach would require about ten core-years and would not scale well' to mean they had a more efficient method.


They do, but does it matter? Ten core-years is not a forbidding threshold.


Not really - it does make a difference between amateur attackers and professional, but the potential value is high enough that there will probably be plenty of well funded attackers.


Oh these researchers don't understand PR (public relations). Here's an amusing way they could get one hundred fold the publicity that they're getting now without (technically) lying.

They cracked 12720 RSA keys. They could reveal the secret modulus in half the cases (6360 RSA keys) without giving away any indication of how they're doing it.

What they could have done is set up a website titled "RSA Cracker" which reveals one key every 3 minutes over a 2 week period.

At the end of 2 weeks, they could shut down the website without explanation.

Can you imagine the media frenzy that would ensue?


Amateur question, but is it possible that the weak numbers are ones that pass some primality test which can give false-positives? Like Miller–Rabin I believe is only a probabilistic test - if it was run with not enough iterations, there would be some specific numbers which need more than N iterations (where N is an arbitrary threshold in some software). Would those number be in the "weak" subset found in the paper?


interesting that they were able to identify that there was bug without being able to say what it was or how it happened.


Well, that's the thing. They don't know who generated these keys or how. They don't know what the bug is; all they can see is the end result, not how it happened, without talking to the people who generated the keys and investigating the algorithms used, the entropy sources used, and trying to recreate the state of the system at the time the keys were generated.

But without knowing the bug, there is still the enormous vulnerability; pretty much anyone else could do what they have done, crack these keys, and MITM any of the sites that use them.

So, they could have kept quiet, tried to track down who had generated the keys, try to track down how they had generated the keys, and see if there is a problem. But the longer they do that, the more people they'll have to contact, the more likely it is that word will leak out, and the more likely it is that someone nefarious could use this information to their advantage.

By making a big announcement about it, they let everyone know now. Everyone knows to start looking at these keys. They will likely notify any CAs that they can identify of the bad keys that they have found so that they can be revoked (though they have mentioned that finding contact information has sometimes been difficult). Yes, it means that we don't really know how this happened; but it does mean that a lot more people will probably start auditing their entropy sources and pseudo-random number generators to make sure they are working properly.


Yeah, and I think they did not do a very good job explaining it in the paper. Perhaps they assume more familiarity than I have.

But as far as I can tell, the result is that, of public keys collected "in the wild" show more duplication than should be expected, so the random number generation schemes used are not "random" enough. Is that it, or is there more to the story?

(Edit) No, there's more. I don't quite understand (and don't think they really explain), but somehow they use the fact that there's lots of keys to find ones with common factors, which allows you to factor both of them. This sounds basically like the Euclidean algorithm but I couldn't tell what method they're using to find them.


As far as I can tell, this is what they're doing: if two different keys have a factor in common (i.e. A=PQ, B=PR), then you can use Euclid's algorithm (which just requires repeated subtraction, and is thus really easy) to find P (=gcd(A,B)), and then just use division to find Q (=A/P) and R (=B/P) easily.

So what the researchers did, apparently, was to gather all the RSA public keys they could find (6 million or so) and then calculate the gcds of all pairs of keys. (Presumably they were slightly more clever than this, but it doesn't really matter--doing this is still vastly easier than factoring even a single public key). Whenever they found a gcd that wasn't equal to 1, they'd cracked (at least) 2 keys. And it turns out that roughly 0.2% of the keys could be cracked in this way (for a total of 12720 1024-bit moduli).

As far as I can tell, the problem isn't with RSA itself, but with whatever algorithms are used to pick two random primes to multiply and produce the key (or moduli, whatever)--they don't have high enough entropy and output duplicates ~0.2% of the time.


Is it possible/likely that many of these bad pairs are from this debian/openssl bug?

http://taint.org/2008/05/13/153959a.html


None of the newly found a ected moduli are blacklisted (cf. [24]). Based on this brief analysis, we conclude that the problem reported in the main body of this paper persists and that the 99.8% security level of 1024-bit RSA moduli is further reduced.

pp. 15 http://eprint.iacr.org/2012/064.pdf


They explicitly excluded the Debian/OpenSSL blacklisted keys from their sample before checking for shared factors.


[deleted]


It's mentioned all over the place in the paper: http://eprint.iacr.org/2012/064.pdf


I take it you didn't read to the end :)


They do, but they abbreviate it (R.S.A.)

To clarify, is this a flaw only in RSA, or RSA and DSA?


the source paper linked in the article says DSA is not affected.


No it certainly does not. This source linked paper in fact says the opposite. Taking time to read the paper results in a fun experiment without any cryptographic breaks. Low entropy randoms = bad keys, for all Asymmetric and Symmetric keys.


Would it be valuable to set up a service that could check your public keys against a database of others checking for common factors? Or just better to fix the entropy problems and assume the problem doesn't exist anymore?


I think the original article alludes to this but doesn't say it explicitly:

If they built such a service to let people test their own public keys, the service would actually provide a much bigger service to attackers than it would to users.

Public keys are public, right? There are huge LDAP databases out there just brimming with certificates (i.e. signed public keys) just waiting to be harvested. And most of the people whose certificates are in these databases would not be paying attention to this news, but an attacker certainly would.

You probably see where this is going.


Couldn't the service allow you to check your private keys, rather than check a public key, without transmitting the actual key.

You know (pub,priv). They know either (pub,priv) or (pub).

Essentially, make use of your unique (probably!) ability to sign something with your private key.

There's the issue of traffic analysis which needs to be solved - they have to reveal to you whether the key is compromised, and there's only two possible answers, so they have to be careful not to reveal it to in the traffic metadata.


Better yet, they can just publish something encrypted with every compromised public key. Only people with the corresponding private keys can ascertain if they're compromised.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: