Hacker News new | past | comments | ask | show | jobs | submit login
PCG: a family of better random number generators (pcg-random.org)
45 points by fanf2 on June 10, 2018 | hide | past | favorite | 51 comments



This is the Nth time this website has been promoted on HN. As far as I know, a peer-reviewed academic paper has not yet been published; one was withdrawn for strange reasons. The algorithm, and claims made on this website, have garnered some significant criticism.

Getting an RNG right is important. I'd not trust this method until other people in the field have verified that the author's claims are true and critics have been satisfied. The accepted method for this is peer review.


Getting a secure RNG right is important. PCG isn't a secure RNG; neither is xorshiro, xorshift, or MT. You can't use them anywhere security relies on not predicting future outputs from past ones. If you have to ask, don't use them.

The most confusing thing about PCG's documentation is its "challenging" level of prediction difficulty. That's not a thing. The PCG authors should not have compared ChaCha or Arc4random with their generator.

Other than that, I'm a little confused why anyone would care how good PCG is. We have adequate insecure RNGs already, and secure RNGs are already so cheap that they make a reasonable default.

But, whatever. If people want to golf who can make the best insecure random number generator, seems fine.


I think all these people complaining about PCG not being secure or the existence of a new insecure PRNG have never worked in simulations or HPC. Absurdly fast PRNGs that are reasonably well behaved are super useful in things like monte-carlo simulations and black-box optimization algorithms.

The AVX-512 version of PCG runs at around 4 bytes / CPU cycle. That's pretty hard to beat.


There are secure AEAD constructions that are <1cpb today, and they do extra MAC work that a simple PRF in a CSPRNG doesn't need to do. I'm sure PCG is faster, but for the overwhelming majority of applications, it's not so much faster as to matter; meanwhile, a fast CSPRNG also gives you security, which you might find you need later on.


Can you clarify this? A RNG is generating n bytes of output per cycle. An AEAD is consuming n bytes of input per cycle. And an AEAD doesn’t need to have uniformly random-looking output to be secure.


An AEAD is consuming n bytes of input per cycle and producing at least n bytes of output, that is, the output ciphertext plus tag.

The definition of authenticated encryption is, essentially, indistinguishability of the ciphertext plus MAC security of the tag. You're right, the tag does not need to be indistinguishable from random, but in the majority of the cases it is anyway.

One significant difference between these non-cryptographic generators and cryptographic ciphers is that of latency---generators often optimize for latency, whereas ciphers optimize for throughput. By taking a cipher and a sufficiently large buffer, you can have low latency as well, at the cost of some memory.

For comparison, consider MORUS, AEGIS, and Tiaoxin, three unbroken contestants of the CAESAR competition. MORUS uses only AND, XOR, and bitwise rotation, and achieves somewhere between 0.5 and 0.66 cycles per byte on current x86 chips. AEGIS and Tiaoxin use the AES round, and where AES-NI is available, performs at between 0.15 to 0.25 cycles per byte. The claim somewhere above is that PCG can do 0.25 cycles per byte (or 4 words per cycle) when going all out with AVX-512; that's only hard to beat if you've not been paying attention.


Ah. So the “AEAD” part had little to do with it, in the sense that a non-authenticated cipher without associated data would work just as well. ChaCha20 makes and excellent and quite fast RNG, whereas using ChaCha20-Poly1305 as an RNG would be rather silly.


The AEAD bit is just as a baseline of performance - look how fast you can make this go if you had to do all the work to be an AEAD scheme. For 'merely' a CSPRNG, you can go faster. The argument is, for lots of typical applications that require randomness, you might as well go with a CSPRNG because it will be fast enough.


If you are using AVX, you might as well use AES-NI, which will probably be faster than PCG. Also, the xoshiro/xoroshiro RNGs are faster than PCG.


Not sure about that. The pcg C implementation uses some handrolled assembly and iirc already does SSE or AVX instructions.


Which implementation are you talking about? The only assembly I can see in the reference implementation is used because clang fails to optimize a rotation. There seems to be no explicit SIMD at all.


My bad. They should add that.


Is a "challenging" level of prediction difficulty not a thing because it provably doesn't exist, or do you just want more evidence that the bar is reached?

Use case: a randomized algorithm, performance critical, but some users might expose it to the web. Using secure randomness is too slow to be feasible, but using a trivially weak RNG might make people vulnerable to DOS attacks.

A potentially weak RNG that at least seems to be difficult to predict acts as a buffer. If it takes brute force effort to break the RNG, that correspondingly cuts down the strength of an attack, and the extent to which your adversaries do not have access to a vulnerability, you are safe from that attack from them. Since this is securing availability rather than privacy, should you end up finding yourself unexpectedly vulnerable, you can always switch it out for something else.


"Challenging" is inherently unfalsifiable, and gives you very little information about what security you're getting. You can come up with a better-than-bruteforce attack, only to be met with "see? I was right all along, it was still challenging!". There's a reason cryptographic algorithms come with concrete security claims; it becomes a mess of definitions and rationalizing of attacks otherwise.


Imprecise, perhaps, but certainly not unfalsifiable; a MCG is not challenging, xorioshiro is not challenging, a stream of 9s is not challenging. Not knowing precisely where to draw the line is not to say there is no difference between 2^60 and 10.

It is not like you have to be right first time and forever more; in the absolute worst case and the NSA already has a perfect attack, that still leaves you better off than otherwise, and frankly if the NSA wanted to DOS you they would manage regardless.


"Not breakable in fewer than O(2^(n/2)) operations" is an imprecise security claim; "challenging" means nothing. "Challenging" may not even mean "computationally hard"---breaking a truncated LCG is arguably quite challenging if you are unfamiliar with lattice reduction.


Do you lock your front door? Is your lock unpickable, or robust to attacks from national superpowers? What formal security guarantees does it give you?

Surely you must accept that there is some scope, somewhere, for better-than-nothing security, some amount of protection that is not robust against arbitrarily skilled adversaries, but nonetheless makes it harder to break your things. So either you need to show that it is not useful in this case specifically, or you need to show that this particular formulation in untenable. Arguing about words does not get us closer to an answer; I have shown a specific threat and a specific mitigation, either it helps or it does not.


I will continue to mark the security of PCG as "unwilling to quantify", which is strictly below "better than nothing". Debating the virtues of the latter is therefore pointless.


Nowhere does O'Neil discourage expert analysis of PCG generators. She does lack the ability to do direct research to the standards of the industry, but discusses related work and attacks in the paper, and has encouraged further research. I don't see how you can see the evidence presented in the paper and claim there is zero practical difference between it and the examples I listed, even just talking about today's adversaries.


> Getting a secure RNG right is important.

Why are you bringing up secure RNGs? PCG isn't in this category.

Getting a non-secure RNG right is important. There are plenty of simulation applications which rely on RNGs like these to make high-quality statistically random selection results.


And yet the page calls the other RNGs out as being insecure:

> Many RNGs can be predicted with after observing small amount of their output. If you use random numbers as a way to ensure fairness or unpredictability, that's a problem.

> [Talking about PCG] * It's much less predictable and thus more secure than most generators


Why does the table on the website include them?


If you're comparing all PRNGs, there's no reason to exclude the CSPRNGs. They're fine to use for all sorts of purposes. They mostly just tend to be slow.


As usual, it is worth testing and seeing the results empirically. For an excellent project testing these and comparing results, see https://github.com/lemire/testingRNG.


This is not enough. The results need to be understood and interpreted as well. For example, it is questionable whether linear dependencies in some bits matter for any of the typical applications of non-CSPRNGs.


That table seems as if PCG has no downsides, and yet I've never heard of it and we're not all using it. The paper is from 2014, which is fairly recent as far as these things go I suppose.

I'm not well-versed enough in each topic they mention in the table, but it's definitely questionable to equate CSPRNGs with PCG and label them both green under "prediction difficulty". Without saying so outright, on /predictability.html it is made clear that this is not a CSPRNG.

The columns are not very well explained, either. I would assume that anything whose source is not a TRNG would have "Reproducible Results", yet arc4random supposedly does not have that. How an algorithm can be non-deterministic given the same inputs, I would be very curious to learn, as it seems like the holy grail of CSPRNGs.

I'm a little skeptical, but it looks like something to keep an eye on. They claim to have both simpler code, faster code and better randomness than the standard Mersenne Twister (which was my go-to RNG for non-cryptographic purposes).

Edit: It seems the page hardly changed since 2015. Maybe add (2015) to the title? Source: https://web.archive.org/web/20151030045740/http://www.pcg-ra...


Disclaimer: My pet programming language has PCG as one of its available RNGs. It also has xorshift.

PCG doesn't have many downsides, and indeed is slowly but surely outcompeting the other hashes in the fast general RNG niche, like MT, xoroshiro, and xorshift.

Your "label them both green" complaint is misleading. The descriptive word "Challenging" is used to describe PCG, while "Secure" is used to describe the CSPRNGs. In this context, it's pretty easy to understand that the "Secure" in the description is the same as the "Secure" in "CSPRNG". The author admits in the PCG paper [0] that PCG isn't meant to stand up to cryptographic scrutiny and that it hasn't been reviewed as such, while also demonstrating that, by construction, the predictability of PCG is pretty challenging, especially compared to some other RNGs still in popular use!

From NetBSD's man page on arc4random: "There is no way to get deterministic, reproducible results out of arc4random for testing purposes." Other implementations of arc4random don't have the ability to control the initial entropy.

Ultimately, as far as whether or not the code is simpler or faster than MT, you can judge for yourself [1]. It is not a hard read.

[0] http://www.pcg-random.org/paper.html

[1] http://www.pcg-random.org/download.html


> The descriptive word "Challenging" is used to describe PCG, while "Secure" is used to describe the CSPRNGs. In this context, it's pretty easy to understand that the "Secure" in the description is the same as the "Secure" in "CSPRNG".

I would say that the same color should mean that it is of equal value. In this case, even after comparing the chosen descriptive words, it was not clear to me that this was not supposed to be a CSPRNG. It seemed so, but it could have been that it's just too new to make such a claim, or perhaps that an attack was found. The information is there when you look for it, but it's not obvious from this first impression.

I see where you're coming from, but I was not trying to be misleading. I think it's genuinely confusing for someone who knows what a CSPRNG is (me), let alone the average developer who wants to generate private keys fast and finds this.

Edit: and then there's this, as pointed out in another comment: http://pcg.di.unimi.it/pcg.php#false Apparently it can be predicted with only 3 outputs. If memory serves, even good ol' MT does much better than that.


Vigna's points are addressed by PCG's author directly here: http://www.pcg-random.org/posts/on-vignas-pcg-critique.html


I don't know about the rest of it, but the section of this response about prediction difficulty is word salad, and it's pretty amusing that the "pcg32 is annoying to predict" claim is backed by a proof of concept of a paper from 1988 (it is approximately "problem on an academic CTF" hard; someone should do that challenge, since I feel like people are always looking for straightforward-but-plausible applications of LLL on these things). Vigna didn't claim to have designed an RNG that was difficult to predict. But the PCG author did.


Vigna starts with a bullshit argument. You can likewise decrypt ssh if you know the secret key.

On the other hand vigna's (and therefore linux) RNG's are easily predictable without knowing the internal state, but just by observing the external output for some short time.


> On the other hand vigna's (and therefore linux) RNG's are easily predictable without knowing the internal state, but just by observing the external output for some short time.

They are not CSPRNGs, so it is expected that they are predictable. I don't think anyone claimed the contrary.


But PCG is not and it is simple and fast. Its qualities are much better than vigna's. He did mostly respond with bullshit, though his other arguments are better. But not convincing still.


Claiming that PCG is unpredictable is dangerous. If it was really unpredictable, it would qualify as a CSPRNG... Unlike you, PCG's author does not claim it is, but even claiming it is "challenging to predict" is questionable.

There is a difference in statistical quality concerning the binary rank test, but this was fixed with the new PRNGs published in the xoshiro paper. Whether passing the binary rank test signifies "much better" quality is debatable...


> But PCG is not

With only 3 outputs you can compute the internal state. MT needs something like 36 if memory serves.

Perhaps the misunderstanding is due to the way it's written. I was wondering as well: "huh, why do you need to enter the internal state as well as three inputs, no shit that you can predict the next output or something..." but what he really writes is that you enter an internal state, then it uses CFG to generate three outputs, and from those three outputs alone, it derives the internal state again.


Interesting idea, but I'm not convinced unless he breaks it without knowing the internal state. The PCG idea is extremely simple yes, but you need to know the first bits to know which algo is used. And the first bits are the hardest to get.


I must agree. They don't make it clear that it shouldn't be used for secure computing(cryptography) purposes. They compare chacha20 and mersene twister in the same table,isn't it obvious that chacha20 isn't ideal for non-crypto use?


Chacha20 is plenty good for most non-crypto uses.


I recently switched from Mersenne to PCG, was considering xoroshiro. Use case is a card game. Mersenne was weighing down the AI because the AI does a lot of cloning so my need was a minimal amount of intermediate state


If that's your use case, pregenerate more of the raw stream and use that, then you can avoid the RNG running at all a lot more. What you do to that stream may vary depending on path, but the stream of random numbers is, ahem, constant.


Never use it for anything involving security. Ever. It's just inviting pain. It's not what it was designed for.

With that said, PCG is my go-to for testing things that need fast random generators (such as prototyping ideas about hashing-based data structures). It's fast and I haven't encountered situations in practice yet where it causes failures that a stronger random source doesn't. Mersenne Twister, in contrast, can easily be the bottleneck for testing small/fast structures.


Even if it’s 100 timed better than Mersenne Twister, a lot of people just throw MT at the problem and it’s solved. Why research new algorithms when old ones are good enough in your use case?

Usage is not proof of quality.


One author of Xoroshiro has written a detailed (and harsh) analysis of PCG.

http://pcg.di.unimi.it/pcg.php


Ouch. I did not know this was a competitive field: this description you linked is very harsh, while the table on the front page (from OP) is the exact opposite. As if both are trying to sell their product by trying to discredit the competition.


There's more, too. That seems to be a response to this: http://www.pcg-random.org/posts/implausible-output-from-xosh...

And there's a response to the response: http://www.pcg-random.org/posts/on-vignas-pcg-critique.html


For a fast and secure RNG, take a look at Google Randen instead: https://github.com/google/randen


Melissa O'Neill gave a talk in the Stanford EE380 Computer Systems Colloquium series.

See http://ee380.stanford.edu/Abstracts/150218.html for the abstract and https://www.youtube.com/watch?v=45Oet5qjlms for the YouTube video.


Rob Pike is discussing Go’s math/rand update to PCG for Go2 at https://github.com/golang/go/issues/21835



Does PCG need to use 128-bit arithmetic to generate 64-bit random numbers?


Is this just an LCG that feeds into a markov chain? This seems more like marketing material than any new technical material.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: