I went back to the root page of this site to see what PCG is, and I don't unders...

pbsd · on Nov 11, 2015

It means exactly what it sounds like: the author tried to attack the generator, failed, and thus concluded that it must be a Very Hard Problem.

Note that the field of pseudorandom number generation for non-cryptographic purposes is much less rigorous than cryptography. Typically, for a new generator to be accepted as "OK" all it needs to do is pass a number of fixed statistical tests, usually one of the TestU01 batteries [1]. This is usually the only falsifiable claim you get, and if you're familiar with diffusion and how to achieve it this is easy to work out. Other falsifiable claims include equidistribution, but that is not a very useful guarantee---simply incrementing by 1 will achieve that. The notion of indistinguishability against computationally-bounded adversaries does not exist. All of this contributes to this being a field where crackpottery abounds, and is hard to distinguish good from bad.

For example, here's a generator that uses 1/4 of an AES round per output word, and that passes the Crush battery of statistical tests (which uses ~2^35 samples). Is it a good generator? I would not bet on it.

  #include <stdint.h>
  #include <immintrin.h>

  struct S {
    static const unsigned kAesRounds = 1;
    union {
      uint32_t state_[4 * 4];
      __m128i  words_[4];
    } u_;
    unsigned counter_;
    const __m128i key_;

    S(uint64_t seed)
    : u_{{0}}, key_{_mm_set_epi64x(seed, -seed)}, counter_{0} {
      for(unsigned i = 0; i < 16; ++i) (void)next();
    }

    uint32_t next() {
      const uint32_t output = u_.state_[counter_];
      __m128i t = u_.words_[counter_ & 3];
      counter_ = (counter_ + 1) & 15;
      for(unsigned i = 0; i < kAesRounds; ++i)
        t = _mm_aesenc_si128(t, key_);
      u_.words_[counter_ & 3] = t;
      return output;
    }
  };

[1] http://simul.iro.umontreal.ca/testu01/tu01.html

JoachimSchipper · on Nov 11, 2015

Nit: you call _mm_aesenc_si128() once per call to next(), right? I don't see how that constitutes "1/4th of an AES round per output word". (You do output 1/4th of an AES block.)

You are, of course, right about the actual point you're making. And calling _mm_aesenc_si128() once per 4 calls to next() may well suffice to pass a statistical test. Then again, even an LSFR passes most statistical tests...

pbsd · on Nov 11, 2015

Yes, you're right, I am calling the AES round once per `next()` call. But note that this his could be rewritten as

  if(counter_ >= 4) {
    counter_ = 0;
    u_.words_[0] = _mm_aesenc_si128(u_.words_[0], key_);
  }
  return u_.state_[counter_++];

Which only uses one AES call every 4 words (and only one block of storage). Instead, I chose to avoid the `if` and compute blocks ahead of time, which makes for more predictable performance.

nkurz · on Nov 11, 2015

Is there a particular reason you would bet against it, or do you just saying that you wouldn't would bet for any implementation without proven theoretical properties?

(As an aside, it's probably just my lack of familiarity, but considering how simple the algorithm is I find the syntax surprisingly hard to follow. Is this modern C++?)

pbsd · on Nov 11, 2015

Although this thing is able to fool the 200 or so tests from TestU01, it would take a short time for an expert to devise a test that would distinguish it from random, and even recover the key (read: seed). On the other hand, AES-CTR (conjecturally) fools _all_ statistical tests that take less than 2^128 effort, and on top of that it is still very very fast (0.63 cycles per byte on Haswell). So what really was gained? Cryptographic primitives have gotten so fast that the additional speed from these ad hoc generators seems like an unworthy risk. I concede that it can be very fun to try to make them, though.

Additionally, a generator that passes those 200 tests is not guaranteed to be high-quality for a particular usage, which might as well be considered nothing but another statistical test. There is the famous case of the r250 generator---an additive Fibonacci generator that passed all statistical tests of the time---which turned out to have characteristics that rendered costly real-world simulations wrong [1].

That is C++, yes, but I don't think there is much of anything modern about it. Trying to save on space probably contributed to its unreadability a bit.

[1] http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.69....

nkurz · on Nov 16, 2015

Here's a paper that includes an analysis of a similar approach that uses 5 rounds of AES under the name ARS-5: http://www.thesalmons.org/john/random123/papers/random123sc1...

pbsd · on Nov 16, 2015

I know this paper, and like this approach. Reduced-round primitives let you take advantage of existent analysis, and keep some guarantees while improving speed. In the case of AES, we know that the probability of any differential for 4 rounds is at most 2^-113; similar for linear probabilities [1]. I'm curious why ARS-4 fails Crush; I wonder if the custom key schedule is the culprit.

[1] https://eprint.iacr.org/2005/321

YAYERKA · on Nov 11, 2015

about the C; https://software.intel.com/sites/landingpage/IntrinsicsGuide...

Veedrac · on Nov 11, 2015

That PCG is ostensibly less predictable is not a prime selling point, more a small nice-to-have in the scheme of things.

Of course, PCG doesn't exist to secure secrets. That's for cryptographic primitives, not any old RNG. However, there are downsides for using trivially predictable RNGs. A randomized algorithm might degrade performance with carefully-crafted inputs. This may expose you to DOS attacks. Hashes are a better known issue, where several languages have moved to make hashing less predictable, and thus less game-able.

It would surely be better for an implementation to avoid exposing any of this altogether, but an RNG intended for general use should have the worst-case scenario in mind.

The reason not to go full-on cryptographic is simply that the alternatives are faster and have more features. A list of some of the cool stuff PCG has is on the front-page, but here goes:

It's extremely fast, pretty much the best non-cryptographic RNG out there distribution wise, of those tested, and a tiny dependency. The RNG is reproducible and supports multiple streams, including streams dependent on the address space. This also allows emulating Haskell's RNG splitting (where one RNG creates two new independent ones), which is neat if you don't want to share state between multiple parts of your program but still want global reproducibility, or even if you're just a functional programmer. You can skip it backwards and forwards, which is little used but a ton of help when you do want it. Each instantiated RNG is a few bytes big. Something like the Mersenne Twister is a bad thing to instantiate wildly, especially on memory-constrained systems. Not a massive issue, but it's still silly any RNG fails at this. Last but certainly not least, it's a tiny dependency.

tptacek · on Nov 11, 2015

So basically the only normal engineering reason to use PCG is if you need deterministic insecure random numbers: in other words, for simulations and testing code.

It is hard to imagine another scenario in which using an insecure RNG is good engineering. Secure RNGs are somewhat costlier than fast insecure RNGs, but it is very hard to predict which incidental random numbers won't ever need to be secure.

(I would generally take anything 'pbsd says about RNGs to the bank.)

Veedrac · on Nov 11, 2015

If you're fine with a cryptographic RNG, then definitely. I'd even agree that cryptographic RNGs should be default. Rust made the right choice here.

But look at C++'s new RNG library. Go's random library. People don't want to give up the lightweight, fast defaults they have. Even Rust's canonical random library has an XorShiftRng. These are what PCG is competing with, and replacing all those weak RNGs with PGC RNGs would make them all much better.

DanBC · on Nov 11, 2015

It's definitely not a cryptographically secure PRN. I suspect it's used for people who want a lot of random data for modelling.

There's a bit of discussion here: https://news.ycombinator.com/item?id=9887548

Slavik81 in that thread mentions raytracing as an example use.

eliteraspberrie · on Nov 12, 2015

Only cryptographers care about "unpredictability." Outside of cryptography, the interesting properties of a PRNG are speed, period length, and state size.

But a good CSPRNG has excellent statistical properties. So if you make "unpredictability" a requirement, their slow PRNG becomes very attractive. In other words, they made up a problem for their solution.

If you need a statistical PRNG, use a statistical PRNG. (The best are xorshift1024* and xorshift128+.) If you need a CSPRNG, use a CSPRNG.

tptacek · on Nov 12, 2015

This sentiment is totally untrue and is why the password resets in PHP apps are the subjects of security conference presentations.

It can be surprisingly tricky to predict which random values an application uses will end up being security-critical.

ahh · on Nov 11, 2015

An interesting case would be something like hash flooding prevention. The cost of a full CSPRNG is probably not worth it--no security is broken if we lose the secrecy of one generated seed--but at the same time we want something that is at least nontrivial for an attacker to predict.

I agree that most cases you either don't care at all or need a real CSPRNG.

tedunangst · on Nov 11, 2015

Preventing hash flooding requires a single 128 bit seed per table. How expensive is it to generate that with a real CSPRNG?

ahh · on Nov 11, 2015

Depends how many hash tables you have! :)

I agree that it's a marginal gain here, but Thomas asked for a case where predictability of "more difficult than a LCG or Mersenne Twister, but not infeasible under crypto assumptions" would have value. This would be an example.

dminor · on Nov 11, 2015

I believe it means it's more difficult to guess it's internal state than other commonly used PRNGs.