Exploit Information Leaks in Random Numbers from Python, Ruby and PHP

Xk · on Dec 5, 2012

Those interested in this should look at a paper from Vern Paxon and Nicholas Weaver:

http://www.icir.org/vern/papers/witty-imc05.pdf

A summary of it: A worm used a linear congenital generator to generate its randomness. It used this generator to pick which IPs to try to infect, which hard drives to write data to, and what to write. These researchers used a /8, and were able to use that to count, exactly, the bandwith of all infected machines, how many hard drives machines each had, the time they started up, and locate the exact machine which initially spread the worm. It's really quite amazing that you can get all of this from just packet captures, before you think about it.

sparky · on Dec 5, 2012

congruential? :)

fx5 · on Dec 5, 2012

Very interesting, thank you for sharing.

tptacek · on Dec 5, 2012

This is a pretty great post. We need lots more posts on practical exploit development for RNG flaws, because there are a lot of bad random number generators out there.

I want to respond to this headline, though. Use of MT as a CSPRNG is very, very common in PHP applications. And it's also true that MT is the algorithm used by Ruby for it's "rand". But this is not a very common Ruby flaw, at least not like it is in PHP, because virtually every Ruby build provides an explicit secure random number generator, either through OpenSSL or (more commonly) through Rails' SecureRandom.

You should know that unless your RNG calls itself "secure" or "cryptographic" --- like, in the function name --- you are using rand(), not a CSPRNG, and you can't count on it for security ever. You will have the exact same problem in most mainstream languages. Secure random number generators say they're secure. Nobody says Ruby's rand() is secure.

(I'm pretty sure the same is true of Python, but I'm less confident of the specifics. I think this is a very fair issue to raise with PHP in general, though.)

wulczer · on Dec 5, 2012

> (I'm pretty sure the same is true of Python, but I'm less confident of the specifics. I think this is a very fair issue to raise with PHP in general, though.)

Yeah, Python's random module uses MT, but you can use PyCrypto, which provides an API-compatible cryptographically secure module (PyCrypto.Random.random).

inglesp · on Dec 5, 2012

The Python docs[0] for the random module also state, quite clearly, that "the basic function random()... is completely unsuitable for cryptographic purposes".

[0] http://docs.python.org/3.3/library/random.html

Moto7451 · on Dec 5, 2012

Very very minor nitpick/addendum to your point. Some languages let you swap out the rand function for a secure implementation. In those cases its important to make sure that that mechanism is actually in place.

In Perl for example:

http://search.cpan.org/~mkanat/Math-Random-Secure-0.06/lib/M...

tptacek · on Dec 5, 2012

Which is evil, because you want assessors and code reviewers to be able to quickly spot which RNG you're using.

seats · on Dec 6, 2012

Not to mention wanting something to break if the secure implementation somehow were separated from the code using it (versus silently reverting to PRNG).

Jabbles · on Dec 5, 2012

Is there any advice you're able to give on how to distinguish situations where you need secure RNG from those that don't matter?

For instance, you definitely need it in poker hand generation - would you need it for random loot generation in an MMO?

Also by "secure" you mean /dev/random, and /dev/urandom is used as if it was rand()?

wladimir · on Dec 6, 2012

Using a secure RNG is always a good idea, unless you know you need lots of (predictable given the seed) random numbers for example in Monte Carlo sampling.

Even in randomized algorithms you may have to be careful what RNG you use, because of DDOS risk. For example see the problems with Python and Ruby hash tables that could be exploited to have worst-case behavior because their behavior was entirely predictable.

In a MMO you most certainly want to use a secure RNG, as cheating is rife in them. If a player can get an advantage by predicting the RNG, someone will.

tptacek · on Dec 5, 2012

You should pretty much just default to secure random.

/dev/urandom is fine; in Ruby apps, I'd use OpenSSL::Random.random_bytes or ActiveSupport::SecureRandom.random_number.

emboss · on Dec 8, 2012

If OpenSSL::Random isn't working (because OpenSSL is not installed for example) there is also SecureRandom in the stdlib. It tries to do the right thing in any situation: Use OpenSSL:Random if available, otherwise it will fall back to what's available in the OS you're on.

Jabbles · on Dec 5, 2012

What cases do you reserve /dev/random for? SSH keygen?

Do those functions just read from /dev/urandom?

tptacek · on Dec 6, 2012

I don't. Just use urandom.

z-factor · on Dec 5, 2012

In Python os.urandom provides random string suitable for cryptographic use.

praptak · on Dec 5, 2012

I remember someone inferring the Nethack RNG state to show off on online servers by dying three times in a row because of kicking a wand of wishing. Here:

http://taeb-nethack.blogspot.com/2009/03/predicting-and-cont...

doe88 · on Dec 5, 2012

Indeed as the author rightfully mentioned in his article this method is not designed for crypto purpose. One can use the following Python method instead random.SystemRandom().randint(...)

gojomo · on Dec 5, 2012

SystemRandom uses the system's urandom, which may not be ideal, either. (The man page for urandom mentions theoretical problems when system entropy pools are depleted.)

The PyCrypto.Random.random option mentioned in another thread by wulczer might be better... but would love an authoritative recommendation from an expert.

tptacek · on Dec 5, 2012

You should probably use your system urandom/random in preference to any application-layer CSPRNG. Your OS developers are charged with maintaining a high-profile high-value CSPRNG used for most applications on the system, and vulnerabilities in it are a hair-on-fire problem. The same is not true of application-layer replacements. The kernel RNG is also in a privileged position to collect entropy.

wglb · on Dec 6, 2012

And what about them new-fangled Intel random number generating instructions?

tptacek · on Dec 6, 2012

You should trust that your OS will use them when it makes sense to use them. :)

(I'm being glib.)

wglb · on Dec 6, 2012

And I was being a bit silly.

KMag · on Dec 5, 2012

PyCrypto.Random doesn't magically gather sources of entropy that aren't available to the O.S. Ideally, the O.S.'s non-blocking randomness source would be an implementation of Fortuna or some other CSPRNG that will eventually recover from compromised state.

Anyway, use OpenSSL's CSPRNG or the system's cryptographic random number source. The weak link in your system is almost certainly not going to be either of these, and they've received a lot more auditing and review than PyCrypto.

The theoretical weakness in /dev/urandom is that it generally hands out more entropy than it gathers, so if there are other exploitable flaws, eventually all of its state will leak. It's important to note that most implementations of /dev/random suffer from relying on estimates of the entropy present in several inputs. The nice thing about Fortuna is that it has the very nice theoretical property that it will eventually recover from leaked state, without relying on entropy estimates. Entropy estimates are a fiction to help some people sleep at night.

cynwoody · on Dec 5, 2012

Here's a hardware solution to the "pseudo" problem: http://gamesbyemail.com/News/DiceOMatic

It's a dice-rollingmachine "that can belch a continuous river of dice down a spiraling ramp, then elevate, photograph, process and upload almost a million and a half rolls to the server a day. I may not get nominated for a Nobel prize, but the deep rumbling vibration you feel more than hear when two rooms away is quite impressive."

rarrrrrr · on Dec 5, 2012

Worth it just for that video of the perfect game of Asteroids.

sp332 · on Dec 5, 2012

This one is even cooler, to me, since the ship flies around a lot more http://www.heise.de/video/artikel/Asteroids-Vladimir-Bleifus...

jgrahamc · on Dec 5, 2012

Funny, that someone did this. I wanted to implement something similar to check that GCHQ weren't lying to me about this: http://blog.jgc.org/2011/12/back-channel-confirms-that-im-ri... Essentially, I thought that it was possible that this block of code was not what they actually used (partially because 0x7f did not appear in the output).

I didn't do it because I had only 326 bytes of random material with 7 bits per byte. Too little to recover the state.

marshray · on Dec 5, 2012

Depends on how much state they seeded it with. It's pretty common for MT to be seeded with 32 or less bits of entropy.

kdsudac · on Dec 5, 2012

Interesting read. Realistically, to "know all the cards in online poker games" wouldn't you also have to reverse engineer how they map the random number to a card?

You would get a jack of hearts, not an integer between 1 and 52, right? or does the exploit somehow work for arbitrary patterns as well?

jsaxton86 · on Dec 5, 2012

I think the author was alluding to the following: http://www.cigital.com/papers/download/developer_gambling.ph...

sophacles · on Dec 6, 2012

There are a few things here.

First, the look at the protocol and see if you can simply determine what number maps to what card.

Next, presuming they are doing the simple thing (given their choice of random... this isn't that unfathomable) and numbering them sequentially you only have 4 possibilities. Thats not that hard to run your data through. There are a few more orderings that make a certain kind of "straight-forward" sense that would be good tries too.

Of course they may not be doing that, and have some sort of "random base deck". That would be a bit harder, im pretty sure you can come up with a system of equations to figure out the card number along with the system described in the article, and as such (and perhaps with a bit more data) still solve it.

Finally, there may be statistical methods to combine with the equations in the article to figure out whats happening. (which you may need anyway depending on how exactly get_next_card() is called and how random is called (same prng for the whole system, or one per game? etc)

antirez · on Dec 5, 2012

A PRNG based on RC4 should be very fast but much more secure than a MT or a LCG.

EDIT: also it is very important to seed with care if you are interested in security. Even a strong PRNG seeded with seed_prng(time(NULL)) will be an easy target for brute force attacks.

At least /dev/urandom should be used for seeding purposes in applications where you need an unguessable PRNG.

tptacek · on Dec 5, 2012

RC4-based CSPRNGs were an OpenBSD idiom. But you should use your OS's or your framework's secure random number generator in preference to arc4random(), because there's more to the security of a good CSPRNG than just the algorithm it to jumble up its internal state.

carbocation · on Dec 5, 2012

> At least /dev/urandom should be used for seeding purposes in applications where you need an unguessable PRNG.

I usually consume /dev/urandom and convert it to the base I require, using that directly as my random number. When you say you should use it for seeding purposes, are you referring to using /dev/urandom as a seed for something like PHP's rand(), or do you mean something else?

tptacek · on Dec 5, 2012

Seeding is largely a problem you have if you're building (or retrofitting in) your own CSPRNG, which you shouldn't do. The random/urandom interface Unixes provide will allow you to shovel in high-entropy data, but I think you're more likely to do harm than good (it's a marginal impact in either direction, though).

You're doing the right thing already.

carbocation · on Dec 6, 2012

Aha, that makes sense. Thanks!

lelf · on Dec 5, 2012

http://news.ycombinator.com/item?id=639976

tptacek · on Dec 5, 2012

That post, which is great, is the normal way bad RNGs are broken by attackers: you trace down how they're seeded and then brute force the seed values.

What's great about this blog post is that it attacks the underlying algorithm; the post you linked to is more fun, but this post is a little more useful.

In either case: just use random/urandom and this stuff is taken care of for you.