Okay. I read the whole thread (ugh). Arguments for using the kernelspace CSPRNG ...

ryuuchin · on May 6, 2016

> Microsoft changed their CSPRNG to FIPS 186-2 or NIST SP 800-90A

It's changed once, in Vista SP1. Since then it's only used AES256 in CTR mode as a DRNG as specified in NIST 800-90. So I'm not sure it's fair to say it changed that much. Linux's CSPRNG has also not seen much change other than to make it more resilient in certain conditions (there was some paper on it IIRC) and to add hardware RNG support (e.g. rdrand).

> 3. OpenSSL is a questionable security product with it's history of vulnerabilities.

I don't think this is the (main) argument against its CSPRNG although it may be one of them. My understanding is the main argument against it is that it's overly complicated by design (e.g. entropy estimation, how it's initialized (especially on Windows)). You could also probably argue that it may be showing its age with its use of SHA1 but you could say the same for the Linux kernel as well.

If you want to look at a userspace CSPRNG done right (or what I believe to be one done right) just take a look at BoringSSL's[1]. In the case where there is a hardware RNG it will create a ChaCha20 instance, keyed with the OS's CSPRNG, and use that ChaCha20 instance to filter the rdrand output (as to not use it directly or xor it). If there isn't a HW RNG then it will just use the OS CSPRNG directly.

There's no entropy estimation, no way to seed it, and by design it's simple and fast. You're correct that the system's CSPRNG may not be fast enough, in fact the BoringSSL dev's mentioned this[2] citing TLS CBC mode. This is probably more a problem on Linux than Windows due to the design of the CSPRNG (Linux's is pretty slow).

So with everything being said I would argue that it's always the correct choice to use the system CSPRNG unless it otherwise can't satisfy your needs. In which case just use BoringSSL then.

As a side note if you really need to generate A LOT of random numbers just use rdrand directly. You should be able to saturate all logical threads generating random numbers with rdrand and the DRNG (digital RNG) should still not run out of entropy.

[1] https://boringssl.googlesource.com/boringssl/+/master/crypto...

[2] https://www.imperialviolet.org/2015/10/17/boringssl.html (under the "Random number generation" section)

atoponce · on May 9, 2016

> If you want to look at a userspace CSPRNG done right (or what I believe to be one done right) just take a look at BoringSSL's[1].

BoringSSL just uses /dev/urandom directly. It's not a userspace CSPRNG. And as you pointed out, for GNU/Linux systems, it's slow. This is why userspace designs, such as CTR_DRBG, HMAC_DRBG, and Hash_DRBG exist- so you can have a fast userspace CSPRNG with backtracking resistance.

Case in point. On my laptop:

$ pv < /dev/urandom > /dev/null

1.02GB 0:01:20 [13.3MB/s] [ < => ]

$ openssl enc -aes-128-ctr -pass pass:"sHgEOKTB8bo/52eDszkHow==" -nosalt < /dev/zero | pv > /dev/null

2.13GiB 0:00:11 [ 198MiB/s] [ <=> ]

And on a server with AES-NI:

$ pv < /dev/urandom > /dev/null

2.19GiB 0:01:06 [ 20MiB/s] [ <=> ]

$ openssl enc -aes-128-ctr -pass pass:"sHgEOKTB8bo/52eDszkHow==" -nosalt < /dev/zero | pv > /dev/null

31.9GB 0:00:34 [ 953MB/s] [ < => ]

I've seen other hardware with AES-NI that can go north of 2 GiBps, as I already mentioned. Although not backtracking resistant, those are fast userspace CSPRNGs, that are clean in design.

I've designed userspace CSPRNGs that adhere to the NIST SP 800-90A standards. They're seeded from /dev/urandom on every call, and perform much better than relying on /dev/urandom directly. I won't say they're bug free, but if you read and follow the standard (http://csrc.nist.gov/publications/nistpubs/800-90A/SP800-90A...), it's not too terribly difficult to get correct, and PHP, Perl, Python, Ruby, and other interpreted languages can outperform the kernelspace CSPRNG.

ryuuchin · on May 9, 2016

> BoringSSL just uses /dev/urandom directly.

Only if there's no hardware RNG support which I admit can happen (it's not perfect, I freely admit that). I suspect that for Google's use on their servers it's a non-issue (assuming where they use it and need the high(er) speed stuff they'll always have rdrand support). If there is rdrand support then it will only reseed from /dev/urandom after every 1MB (or 1024 calls) of generated random data (per thread).