Hacker News new | past | comments | ask | show | jobs | submit login
Random number generator enhancements for Linux 5.17 and 5.18 (zx2c4.com)
242 points by _dh54 on March 18, 2022 | hide | past | favorite | 94 comments



In case anyone's interested in how Windows behaves for the sake of comparison, see: https://aka.ms/win10rng


Note that while CTR_DRBG is totally different from the similar-sounding Dual_EC_DRBG (which is widely suspected of being backdoored), CTR_DRBG has some flaws (being a counter-based design) that allows insecure or secure-but-lower-than-expected-randomness depending on the symmetric cipher used. It seems that it uses AES-256, but depending on whether it takes 224 or 256 bytes of output per round is not discussed on the document (counterintuitively, taking only 224 bytes is slightly more secure than taking the whole 256 bytes).

Edit: re-reading the document, it seems to imply (considering the buffer size) that the output they are using is 128 bytes.


I'm not sure if any of these concerns are really practical. NIST does a great job of making CTR_DRBG sound complicated, but it's pretty close to the simplest possible CSPRNG: it's just a block cipher in CTR mode, which is pretty close to the theoretical simplest secure CSPRNG.

It's true that if you use DES-EDE or something with CTR_DRBG, you have all the problems that come from use a short block with CTR mode --- but if you can reason about how to use CTR mode, you can I think reason about the limitations you'll run into with CTR_DRBG.

You're not getting insecure randomness from AES CTR_DRBG.


And the author of this MS paper is who first publicised the potential Dual_EC_DRBG backdoor years before it was outed by Snowden.


Outed in what way?


Outed by the memos he leaked:

> Classified N.S.A. memos appear to confirm that the fatal weakness, discovered by two Microsoft cryptographers in 2007, was engineered by the agency. The N.S.A. wrote the standard and aggressively pushed it on the international group, privately calling the effort “a challenge in finesse.” From https://www.propublica.org/article/the-nsas-secret-campaign-...

Or for some other discussion https://blog.cryptographyengineering.com/2015/01/14/hopefull...


On the Possibility of a Back Door in the NIST SP800-90 Dual EC-PRNG (2007, Shumow and Ferguson, https://rump2007.cr.yp.to/15-shumow.pdf)


AES always has a 128 bit (16 byte) block size; AES-256, AES-192 and AES-128 only change the key size.

(Also, the 128-byte buffer therefore contains 8 blocks; I believe the size is chosen more for performance than security reasons.)


They use a buffer of 128 bytes for small requests. Larger requests come directly from AES_CTR_DRBG:

> There is a small buffer (currently 128 bytes). If a request for random bytes is 128 bytes or larger, it is generated directly from AES_CTR_DRGB. If it is smaller than 128 bytes it is taken from the buffer.

At least, that's my read of it.


How is secure-but-lower-than-expected-randomness actually possible? I thought being indistinguishable from true random is one of the requirements of a CSPRNG


> I thought being indistinguishable from true random is one of the requirements of a CSPRNG

It's still secure in a way that, for example a cipher that in theory gives you a security guarantee or margin of 160 bits (which means assuming no further reseeding you need around 2^160 bits before losing security guarantees) is reduced to around 128 bits (which means assuming no further reseeding you now only need around 2^128 bits before losing security guarantees). It shouldn't happen in practice (constant reseeding) but some symmetric ciphers paired with CTR_DRBG (Triple DES) makes reading its state a lot easier, especially on systems where they don't frequently reseed (note that this algorithm was invented in the late '90s, so practices now seen as obvious security blunders aren't being avoided).


I know Microsoft claims to have corrected many of the issues raised here but do they have a formal response to it? Do we even care?

Cryptanalysis of the Random Number Generator of the Windows Operating System https://eprint.iacr.org/2007/419.pdf

Cryptanalysis of the Windows Random Number Generator https://www.cs.huji.ac.il/~dolev/pubs/thesis/msc-thesis-leo....


They acknowledged that the Windows 2000 design was also used in XP (https://www.computerworld.com/article/2539986/microsoft-conf...), but Vista and above used NIST algorithms defaulting to CTR_DRBG, although the problematic Dual_EC_DRBG (which was flagged by their own cryptographers as problematic: https://rump2007.cr.yp.to/15-shumow.pdf) was present until the Snowden revelations.

Windows 10's (and 11 and probably above) implementation of CTR_DBRG is different from Vista-8 though, mainly in the entropy generation and the switch from AES-128 to AES-256.


It's a good paper, but note that this is an attack on the (old) per-process RC4-based WRNG, not on CTR_DRBG. The attack essentially observes that if you get RCE on a Windows process and capture the WRNG state, you can get all previous and future states bounded by when the per-process WRNG reseeds (the seeding process reads random bytes from KSecDD, a sort of Windows equivalent of /dev/random).

The Windows design is interesting in that it runs a userland CSPRNG for each process, rather than a single kernel RNG like Linux/Unix provides, but still binds those RNGs to the kernel RNG. This seems like a good idea at first, but turns out not to be: the attack wouldn't be possible if KSecDD was the entire CSPRNG interface for Windows.


Unix also provides many userspace PRNG, for performance or other reasons. But programs have the option of using the kernel directly whenever security matters. Is it even possible on Windows to query directly to kernel randomness with documented APIs?


The 2007 papers predate the Win10 design linked by the grandparent comment. I haven't read the 2007 criticisms recently, but have read the Win10 design many times. I don't think there's anything objectionable in the current design.

Edit: I guess the criticism about the generator running in user mode is still true, although I don't think it's the flaw the authors do. Also, I think the current design adds forward-secrecy / key erasure vs the criticism mentioned in the 2007 paper. I believe the O(2^23) attack described is gone now.


I know it's an HN cliche to talk about the website instead of the content, but I really like the LaTeX look and I want to know how it was done.


The author's Web-site includes a repository of everything, including documentation, at https://git.zx2c4.com/linux-rng/tree/ -- Start with the README at https://git.zx2c4.com/linux-rng/tree/README


Tangentially, I do use a LaTeX inspired Hugo theme[1].

It’s not perfect, but I like it. I worked on math textbooks and LaTeX as a side gig in college and it’s very comfortable for me.

Probably off putting to a non nerd audience, but it’s a self indulgent site, so whatever.

A LaTeX static site generator could be cool itself. I markdown so much these days it might be refreshing to revisit my past. One day I’ll Google that, I guess. :)

[1] https://github.com/queensferryme/hugo-theme-texify


I first assumed it was a PDF and when I got to the video thought "Wait, browsers do video-in-pdf now?"


If they support a somewhat recent version of PDF, they’ll have to; PDF itself can contain video. https://helpx.adobe.com/acrobat/using/rich-media.html:

“Adding video, sound, and interactive content transforms PDFs into multidimensional communication tools that increase interest and engagement in your documents.

All multimedia that are H.264 compliant can be played back in Adobe Reader 9 and later. (H.264, also known as MPEG-4 part 10, is a video compression standard that provides high-quality video without substantially increasing file size.) Video files of varying formats and filename extensions can be H.264 compliant.

Media files in other formats can be played back in earlier versions of Adobe Reader. However, users must install the appropriate application (such as QuickTime or Windows Media Player) to play the multimedia.

Another way to add multimedia is by entering a URL that refers to a video file or streaming media. Three types of URLs can be used: RTMP, HTTP, and HTTPS. On HTTP and HTTPS servers, H.264-compliant MOV and MP4 files are supported.“


Same, I thought it was a PDF.

This is actually an interesting approach, make the website look like a text document but enrich it with web functionalities (videos, interactions). It looks very clean/minimal and distraction-free, excellent to read/learn content like this.


It is very compact CSS and embedded web fonts e.g. "LM Roman", just check the source of page html>head>style tag. Pure minimalism art - I love it too.


> […] but I really like the LaTeX look and I want to know how it was done.

Except that it's fully justified and the spacing between words is horrendous. Seems that no one can get Knuth & Plass right (at least online); maybe because there's no hyphenation / word breaking.

Probably best to just stick with left justified / ragged right.


There is word breaking though. At my screen width, 'overhauled' is broken in the second paragraph.


Lol. I was going to comment when there were only two or so others that I really dislike the font used on the website.

I loathe the old school Latex look. I appreciate Latex as a markup language, but the computer modern fonts always drove me nuts.

To each their own though. It's interesting to me that someone else had the opposite reaction.


If you haven't, check it out from a high DPI printer. The font is much more tolerable there.

I used uBlock customization to block the font from loading to read the article.


> I used uBlock customization to block the font from loading to read the article.

For those on Mozilla Firefox[note], please also give the reader view a try.

Reader view is also available on other browsers such as Safari on the iPhone.

[note] - (if you are not on it, please consider downloading and using it but that is a different conversation)


Computer Modern fonts are very much designed for printing. On a screen, the really narrow parts of each letter are usually close to unbearable.


It depends on the screen and the software. On Linux in Firefox this page's legibility is pretty poor. However, when I was reading this page on iPhone, it was perfectly legible.

Also, Computer Modern on the same screen, but rendered by Okular or Evince, when viewing a LaTeX document, is also perfectly legible.

When it is rendered properly, I find it very pretty and legible.


I'm a mathematician, so it might be that it's calmingly familiar rather than any typographical excellence.


I do a fair amount of statistics research. I'm not sure I've quite found my holy grail of math typeface (although I can think of a couple that come close in different ways), but computer modern / etc was never it for me.

I can totally see how it would feel sort of comforting though. I hadn't thought about it before but it does remind me of old Springer texts, and I could see how that would feel "homey" or something. This conversation has me kind of "resetting" my perspective on it a bit.


I've been using something similar: https://github.com/vincentdoerig/latex-css Seems like it could easily be adapted to match the style.


/dev/urandom and /dev/random interfaces present the exact same interface -- The "Jiggle your mouse to generate encryption key" ridiculousness is now a thing of the past!


> The "Jiggle your mouse to generate encryption key" ridiculousness is now a thing of the past

is it really ridiculous?


Yes. The /dev/random device would formerly refuse to continue giving data even when it had plenty of entropy backing it. The reality is, unless you are in early boot (now fixed by Torvald's RNG seeder) you have enough entropy to generate nearly unlimited quantities of pseudorandom numbers without revealing much of the PRNG's internal state. You only need 256 bits of initial seed entropy to be virtually unpredictable, as long as those 256 bits are strong.

See: Myths about /dev/urandom https://www.2uo.de/myths-about-urandom/

Jiggling the mouse may have made feel better about the security of the software they're using, but really should have been considered a bug (and the UI, a horrible workaround) this whole time.

Now that /dev/random no longer exists, and the kernel now has Torvalds' cache timings based seeder, any software that mistakenly uses it will generate a key instantly regardless of how much mouse jiggling you do.


> https://www.2uo.de/myths-about-urandom/

Needs an update now, it contains numerous facts that are no longer true.


The mouse-jiggling might remain valid for initial seeding / unblocking in an environment without other random sources (including Torvalds' version of jitterentropy), but it's hard to imagine those are anything but theoretical.


I imagine the people most burned by that design run / remote into servers.


Yes because it already collected plenty of data from the mouse since boot.


Great to see as much thought is put in infrastructure as in the improved cryptographic logic. They removed the /dev/urandom and the virtual machine cloned entropy that cause unsafe randomness.

Alongside BLAKE2 algorithmic improvements, we also get safer infrastructure. Very cool!


The jitter dance mentioned in here is pretty interesting. It doesn't seem obvious to me that a deterministic CPU running a deterministic scheduler is going to yield randomness.

But hopefully any modern system has some kind of hardware RNG and the "jitter dance" is just a last-resort type thing for strange systems.


Right, I'm surprised to be reading justifications that amount to "it's been deployed for several years now, so we think it's OK", rather than "here's a report from a researcher who's tested how it behaves on a wide range of systems".

I don't know enough to count myself as a "skeptic", but if I try to apply the test "suppose in ten years time you're reading a news report describing how this feature failed to work as intended; is it easy to imagine the sort of things it could be saying?", the jitter dance doesn't do terribly well.

(I'm imagining something like the case where large numbers of consumer routers ended up generating the same secret on first boot, or perhaps large numbers of virtual machines. And there seem to be lots of ways to imagine a processor that had surprisingly little jitter: maybe an x86 emulator for Arm, or a "Big/Little/Tiny" processor, or "Spectre is really truly absolutely mitigated now we promise".)


> I'm surprised to be reading justifications that amount to "it's been deployed for several years now, so we think it's OK",

I'm not making any claims or justifications about it being good or not. Just a simple statement that Linus put it there, and there it is, and it's been that way for a while. I even referred to it as "voodoo". As I mentioned in the post, 5.18 didn't change anything about entropy sources and gathering. Certainly trying to see more rigorously whether or not the Linus Jitter Dance is defensible would make for a worthwhile research project.


Any comments about how related the Linus Jitter Dance (btw, glad to see that djb isn't the only one making new dance moves..) is to JitterRNG: https://www.chronox.de/jent.html ?

The PDF linked to on the page goes into more detail but that is measured across a number of CPUs and has good performance in both amount and entropy quality produced. You do need to measure it per CPU, but it does comply with SP 800-90B which is what the US Gov considers the standards for randomness.


I haven’t looked into it but from what I’ve read from various sources, not every platform can extract entropy from that process. I took that to mean that the CPUs on which the jitter dance is employed are not entirely “deterministic.” Well of course they are but they must be carrying black box state that affects execution that, so far, is impossible to extract without knowing the full execution trace up to that point.

Can anyone provide a specific explanation for why some of platforms can extract entropy from scheduling jitter and others can’t?


Imagine running a Linux kernel on a fully deterministic CPU emulator. The jitter would be the same on every boot.


Wouldn't disk, memory, network, cache and really anything that could trigger an interrupt also need to be fully deterministic? It seems to me this is really what the jitter dance is measuring to get randomness. It is only reading the cycle counter, all of the above would affect that slightly.


Memory, disk and cache could be deterministic in an emulator. Imagine an emulator for embedded software. Network doesn't come online until booting has reached a certain point and at that time some keys may have already been generated. It is not a showstopper (after all the jitter algorithm did make it in to the actual kernel) but there are cases in which problems would be conceivable.


Elaborate on the threat, please. Who is running the kernel and who is controlling the cycle perfect emulator?


I have an embedded product that generates keys very soon after boot. I run it on an emulator for some reason. Oops, now my keys are fully predictable.


My honest take is that these dumb "but what if I paint myself into a corner" scenarios are why it takes so fucking long for anything to get better.


How would you get a random number on a fully deterministic emulator?


You'd ask the emulator to give you a seed. (This was mentioned in the article, and a way for VMs to do this has been added in this kernel.) You could also refuse to produce any random numbers until you had entropy from some part of the emulator that wasn't deterministic. (Maybe the emulator gives real random numbers for the CPU random instruction, or maybe it lets you connect to a real network.) Since people don't think of cycle counts as something that needs to be nondeterministic for a computer to work, seeding on them makes it possible for people running the program to accidentally run it in a way that breaks the assumptions of the program authors. Note that this is true of many random sources - but the objection was not to including cycle counts, but to emitting random numbers while that was all the driver had collected.


How do you know you got real entropy instead of a replay of an already recorded stream of events?


That's not the threat model (and it can't be a threat model, it's impossible to defend against), the threat model is people accidentally replaying an exact boot (which is more likely than accidentally replaying everything and also failing to seed the VM.)


Probably expose a random device.


Interestingly, it seems the Jitter Dance has previously been discussed on HN quite a bit:

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

Though this is the first time I recall hearing about it.


The "Linus Jitter Dance" mentioned in the article is similar to Havaged, you can read more about it here: http://www.irisa.fr/caps/projects/hipsor/

I recall using Haveged to prevent RNG from blocking on machines without hwrng (i.e. VMs) on old kernels.


Looking through the code, and am surprised to see just how much it takes to provide a secure source of RNG in an operating system.


It's great to see Linux adopting VMgenid. RDSEED over RDRAND when available also makes sense.

> In the per-cpu extension of that design, all entropy is extracted to a “base” crng. Then, each time a per-cpu crng is used, it makes sure that it is up to date with the latest entropy in the base crng. If it is, then it continues on doing fast key erasure with its own key. If it isn’t, then it does fast key erasure with the base crng’s key in order to derive its new one.

Beautiful. This is essentially the same thing the Windows 10 design does in the kernel.


Jason, I'm very happy to see you contributing excellent work to the Linux kernel. Keep up the good work!


What would be best way to evaluate randomness quality? This is something I've been thinking a lot. Yes, there are 'ent' and others, but I am more after trend of randomness quality over weeks and months.



Compress it.

Truly random data is incompressible. Any alogirithm should work.


Random data is incompressible, but not everything incompressible (as far as practical compression algorithms are concerned) is random.


Do you have any examples of that?


I love taking these deep tech dives to appreciate how much thought and complexity goes into something seemingly as simple as random numbers. I know they're critically important for security - which truly is high stakes - so I'm not surprised. But it's fun and impressive nonetheless.


Fantastic article. You're a great writer and I learned a lot. Thank you.


[flagged]


> The past has shown that often the USA government/NSA sponsored or developed encryption algorithms used in many popular devices

This sort of unfounded conspiratorial thinking is absurd. The past has shown the opposite, that whenever government tries to shove broken algorithms down people's throats, cryptographers ignore it and go for something they can prove is correct. BLAKE2 and ChaCha20 are algorithms designed by cryptographers that hate the government.


> This sort of unfounded conspiratorial thinking is absurd.

GP's baseless accusations indeed are distasteful.

> cryptographers ignore it and go for something they can prove is correct. BLAKE2 and ChaCha20 are algorithms designed by cryptographers

BLAKE2 is based on ARX, so proofs regarding some of its properties are very difficult or infeasible, as far as I understand. It is my understanding that other designs are much more popular among cryptographers because they are mathematically and financially easier to analyze, implying better security (because nation-state-level resources aren't necessary for finding vulnerabilities). See, e.g.: https://keccak.team/2017/not_arx.html


Wild-ass speculation alert! Please don't turn into FUD.

NSA floated a cipher a while back called Speck.

https://en.wikipedia.org/wiki/Speck_(cipher)

After the dual-EC debacle it's understandable that people would be skeptical of a cipher from the NSA. People assumed Speck might be intentionally weak or backdoored.

Thing is... it's a stupid simple ARX cipher. Dual-EC was obviously suspect for multiple reasons, but Speck is so simple it doesn't really have room for any "magic" constants that could have been optimized to introduce a back door or vulnerability.

That means there are two possibilities:

(1) Speck is fine.

(2) The NSA knows something about ARX ciphers that we don't.

Option #1 is orders of magnitude more likely than option #2, but if we are entertaining wild speculation there's some for you.


Speck supports configurations with keys small enough to be brute forcable.

There is a nice attack pattern where you standardize something with some insecure modes, then you use targeted attacks to get your targets to use those insecure modes.

I'm not aware of any non-NSA ciphers from the last decade that supports a 64-bit key...


This is a good point. They claim this is to support tiny devices but that would have to be a really tiny device for trimming 64 bits off a key to matter. The 128-bit and 256-bit key versions of Speck are probably fine, but...


There's a cool paper that proposes embedding a NOBUS susceptibility to linear cryptanalysis in constants for ARX ciphers like SPECK (like SPECK, not SPECK), I should dig it up. Reliably demonstrating that capability would be an advancement in the state of the art of cryptography, though, as I understand it.


(3) Speck is fine, but NSA wants to weaken the precedent of not trusting the cyphers it releases, to slip a bad one in later.


Wanting a better reputation is an obvious motivator for 1. This isn't a separate option.


> This is their MAIN WEAPON against massively used, fully E2E encryption.

I’d say their main weapon is people continuing to not design their products around end-to-end encryption.


Yup. Much of the Snowden and wikileaks reports revealed that the government is a lot more interested in getting your data before it's encrypted.


I assume the cryptography community is also aware of these changes. I trust they would have voiced concerns if there were problems.


font on the page is awful


With these kind of changes I always wonder if they make the product more secure or less secure.


It's why I'm thankful it's both open source and highly scrutinized by the community, both volunteers, independent security researchers, and big companies like Google that deploy billions of instances of Linux (servers, google cloud, android, chromeOS, etc).


The backdoored elliptical curves were vetted too…


And we know about it. The backdoor methods have been generalized and now researchers can check for that too.

For example bitcoin's elliptic curve secp256k1 was choosen because its constants were chosen in a predictable way and that reduces the possibility of a backdoor.


I've not been in the loop, what was this?


Compromised approved (subsequently retracted) elliptic curve random number generator [1]

Potentially-compromised elliptic curves used for Diffie-Hellman-Merkle key agreement and digital signatures [2][3][4]

[1] https://en.wikipedia.org/wiki/Dual_EC_DRBG

[2] https://safecurves.cr.yp.to/rigid.html

[3] https://www.hyperelliptic.org/tanja/vortraege/20130531.pdf

[4] https://blog.cr.yp.to/20140323-ecdsa.html


Dual EC DRBG is the known backdoored curve. You have the links to the high level story in a sibling comment.

I would also like to add, however, that the possibility of a backdoor was patented by Scott Vanstone I think, and raised in NIST standardization process (and I suspect standardized under pressure from the NSA more than anything). Other negative facts that were raised include the fact that it sucks badly, i.e. compared to just about any other RNG, it performs very poorly. So the process isn't as bad as it looks.

DualEC was a backdoor, but not a very good one. People noticed the possibility and it sucks compared to literally anything else. The only people who used it appear to be customers of RSA Inc.

I would also like to add that Elliptic (not Elliptical, these are not the equations of ellipses) Curves, even the NIST ones, are not known to be backdoored and there's no evidence they contain any weaknesses at present. There are plenty of non-American cryptographers who are unlikely to keep any analysis a secret if they found such evidence, and I would say quite a few American ones who would also publish.




As Jason mentions, the most important contribution here is to make the code more readable and improve documentation.

But there are also some fairly unambiguous improvements - switching from SHA1 to BLAKE2 for extracting the random bytes for example.


Yes, I think there's some points we can generalize for all software there:

- Readability counts. If you can't read the code, who could test or improve it?

- Documentation needs to be cared for near the code, only then you have a chance it's not outdated

- It's possible to improve correctness and efficiency at the same time (if your code is understandable)

- Use the literature available

- Code once holding high standards will need to be checked constantly too so it doesn't rot.


With this and Wireguard Jason is hitting it out of the park. I wonder what's next.


In this case, more.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: