Hacker News new | past | comments | ask | show | jobs | submit login
The C++ community is polarized about C++11's random-number facility (pcg-random.org)
86 points by DmitryNovikov on Dec 14, 2015 | hide | past | favorite | 61 comments



Features like this make me think that C++ is designed by people who think that perfection is reached when there is nothing more to add. As a C developer, I side with Antoine de Saint Exupéry.


Sigh... Too often that criticism is leveled at C++ without an understanding of the terrain.

I don't think C developers should be sitting on a high horse on this one.

C represents a pretty good example of the problem. Almost every program that relies upon the standard C runtime's random functions has a flawed random distribution, so any correct program completely bypasses that infrastructure and/or has a ton of additional code/complexity layered on top of it.

That open source basically festered until the C++11 <random> proposal was accepted. The chief proponents of that proposal were, of course, those with the most sophisticated needs & understanding for random functions, and the proposal reflects that. So, the problems with that proposal are arguably a consequence of the flaws in the original C standard.

What remains to be done is to harmonize that proposal with something that makes sense to less sophisticated users, who nonetheless have reached the point where their appreciate the problems with C's random functions.


Almost every program that relies upon the standard C runtime's random functions has a flawed random distribution

Right, and that function is a mistake in the C standard. It should have been omitted entirely.


That would have messed with C++'s goal of providing compatibility for C code.

Let's assume things had proceeded as you suggest though. We'd still not have a good solution for random number generation until the C++11 standard came out, and so we'd still have all the forces in place that gave rise to it and to the current mess.


You misunderstand me. I think random(3) should have been omitted from the C standard.


What happens if you don't provide sensible, easy-to-use built-in implementations of things is that a diverse, bewildering ecosystem of different options tends to arise, confusing programmers who just want to get their job done and leading to a high probability of them picking the wrong thing.

See, for example, crypto before NaCl came around.

(NB: I'm not defending either this C++ standard or random(3).)


What happens if you don't provide sensible, easy-to-use built-in implementations of things is that a diverse, bewildering ecosystem of different options tends to arise, confusing programmers who just want to get their job done and leading to a high probability of them picking the wrong thing.

Sure... but what happens if you do specify that implementations should be built-in is that a diverse, bewildering ecosystem of broken implementations get built in, and so people who know what they're doing end up eschewing the built-in implementations in favour of shipping with an implementation which they know works. And ultimately you have a standard function which everybody provides for compatibility purposes, but which nobody should ever use. See, for example, random(3).

Functionality should be added to standards when we can be confident that every implementation in the next 100 years will get it right, and no sooner.


I thought NaCl was something related to browser plugins. What is the connection with crypto?


Name collisions: http://nacl.cr.yp.to/

Although, I've never actually seen it used in the wild.


Threema use it. (I don’t know if they use it in a sane way though.)


They do. There's an audit that confirms it, and you can use Validation Logging to verify it yourself: https://threema.ch/validation/


I think they understand what you're saying. I agree with the other two that not having a clean, working standard means you get people rolling their own or digging out libraries. Quality varies wildly cuz we're talking random numbers with many issues resulting. People would still want a standard, in C or C++, that lets them do this correctly without problems.

So, it leads me to guess there's two ways the the C++ situation would've been made unnecessary:

1. Leave it out of C itself but pro's build and maintain a de facto standard in form of a library as easy to use as random and widely available. Can even be OS random facilities cleanly wrapped.

2. Put a good one in C standard. Becomes the standard de facto and officially.

Preconditions didn't exist or convenience prevailed. So, now there's a new attempt at dealing with the mess. That's my guess as a person looking on from the outside rather than a user of all these tools. Just to be clear on that part.


Ah well. I could see where that'd not have been an entirely bad thing.

However, eventually you need to provide something that works. There is some value in having a portable way to generate random numbers. I can't think of a language that doesn't define one, and I don't think that is by accident.


I think that would have been a real bummer for coders back before the Internet was really a thing. It's a pretty common thing to need and hard to implement.


> That open source

Man, autocorrect killed me there. That should be "That open sore".


random(3) is clearly bad; but contrast this proposal with the simplicity (and security) of OpenBSD's arc4random(), arc4random_uniform(), arc4random_buf().


arc4random_uniform() is the only one that really addresses the matters in this proposal, and arc4random_uniform() can't handle values larger than 2^32-1 (and, in fact, thanks to the wonders of overflow, might do some unfortunate things in a variety of cases), and won't produce values greater than 2^32-2.

So yeah, not exactly addressing the matter.

Of course, that whole arc4* library doesn't have the same challenges as a language standard, which is where a lot of the challenge here comes from.


You're not wrong, but let me be more explicit: the problems arc4random_*() solves are the ones I feel are important: good-quality random with no corner cases (e.g. still reliable if you run out of file descriptors, at least on OpenBSD). pick() or shuffle() really are nice, but I'm much more scared of predictable-by-default or predictable-in-corner-cases RNGs than of errors in a hand-rolled pick().

(I don't write code where e.g. std::poisson_distribution would be useful, so I don't feel I have the background to have an opinion on that part of the proposal.)

Overflow is something that the programmer always has to keep in mind, unfortunately.


Generics and boxed types make overflow a non-concern.

The proposal doesn't really care much about distributions. It's intended for people who don't care about random distributions.


FWIW: generics and boxed types allow you to write code without overflow, but there's plenty of real-world C++ that's full of overflows. Good C uses a small number of types and also avoids overflow...

Although we have different concerns, it's clear that a lot of thought has been put into the proposal, and it seems to achieve its/your goals pretty well. Don't let my concerns put you off.


I think this is a natural consequence of low-level APIs. Anything deemed worthy of being configured by caller is made into an extensible interface, whether that be through flag arguments or types. Unless some lib is very opinionated on how things should be done, the extension/customization hooks have to exist in one form or another.

To satisfy the two camps though (low level control vs "give me some sane default or package up common combinations") someone just writes the higher level API on top.


It is entirely possible to write a highly configurable interface with sane defaults. Indeed, the proposal in this article goes a long way in that direction.


Sometimes the right defaults depend on context unavailable to the low level library author. But irrespective of who provides the higher level API, the bottom line is that there needs to be an API that's most flexible when the flexibility is warranted. This is really in reference to Colin's "nothing more to add" remark.


If there is some other context that is unavailable to low the level author, then that default should be provided by whatever higher level is providing said context. C++'s namespaces provide a super convenient mechanism to make that work smoothly.


C with "nothing left to take away" would be assembly. Why don't you switch from C to assembly? Or figure out what you get when there's nothing left to take away from assembly?

Or maybe your idea of "just the right amount of stuff" is different from someone else's idea of that, and we could be thankful that the world is able to accommodate more than one opinion on this by providing us with multiple programming languages we can use according to our needs and preferences.


C with "nothing left to take away" would be assembly.

C with nothing left to take away is portable assembly.


I wish they had mandated the algorithms used in the distributions. Even with the same generator initialized with the same seed, the normal distribution, for example, will give different results across platforms. At least the results of the generators are consistent across platforms.


Is that feasible for distributions over the reals across CPU architectures or maybe even revisions? FPUs typically do not compute the best possible result for floating point operations.

Also, instructions like 'estimate one over square root' have multiple 'correct' answers.

Edit: https://www-01.ibm.com/support/knowledgecenter/#!/ssw_aix_71... shows things are worse:

"The estimate placed into register FRT is correct to a precision of one part in 32 of the reciprocal of the square root of FRB. The value placed in FRT may vary between implementations and between different executions on the same implementation."


Is this an indication of a randomly seeded Newtown's method?


But why would they? My guess would be that they don't bother to clear a carry flag or are computing the estimate using a few extra significant bits, and do not bother to initialize those at start, but that's 100% guess. I know very little about designing CPUs.


Why not specify the algorithm for std::rand?

The specification, as far as I'm aware, gives no guarantees at all about the algorithm used. Why not just specify a good algorithm instead of adding a new function?


They added more than a single new function. Personally, I like what was added. For what it's worth, I generally don't need to consult the documentation for the new random facilities (for a definition of "new" that includes "four years old"), but I do need to consult the docs for the stuff in <chrono>.

> The specification, as far as I'm aware, gives no guarantees at all about the algorithm used.

You are right about that. And, in fact, the few guarantees the standard has are weak enough that Microsoft ships a standards compliant implementation that is famously limited.


std::rand has global state. Specifying the algorithm doesn't fix that.

And this lets you (easily) generate more than just integers in the range [0, INT_MAX]. std::rand doesn't have that level of convenience.

There are all sorts of inconveniences with std::rand, and it looks like this solves the vast majority of them.


> std::rand has global state. Specifying the algorithm doesn't fix that.

Adding other functions also doesn't fix std:rand. People use std::rand. They will continue to use std::rand, unless it's removed. And even then they'll keep using it, because implementations will add it back in for compatibility.

There's already a simple, half baked, crude random number generator. It's possible to fix it so that it provides at least acceptable randomness, and isn't complete junk. Doing this breaks nothing.

Why add another similar function with similar API issues?


What is this "similar function" with "similar API issues"?

C++11's random utilities and O'Neill's randutils aren't similar to std::rand at all. They're totally different beasts from std::rand.

Making std::rand a decent RNG would probably make the world a better place. I'm not disagreeing with that. But even if std::rand used a decent algorithm, it's still a suboptimal API.


One thing that I would like to have is a common base class that can be passed around. Often, I would like for a function orclass to be told which engine to use, quite useful for unit testing, or for having reproducible results. As it is, the only way to accept multiple different engines is by templating over the engine type. This ends up requiring that every part of the code that uses the random engine be brought into the header file, and classes need to be templated on the engine type to store a reference to the engine.

I was kind of hoping that the `random_generator` class would be a templated class derived from a non-templated base class, so that it could be used in this manner. Looking over it, I can see why it doesn't, as many of the methods are much, much more useful as templated methods, which do not interact nicely with virtual functions.


I kind of agree! But think about performance. A fast pseudo random generator will/should create a random number in less cycles than one virtual function call has overhead. Also it's not hard to create your own virtual random class if you are willing to pay the price.


I like this proposal and I agree that randint is a bad idea.


I think there is a case to be made for doing both. The one advantage of randint is that it makes for a convenient migration path for those relying on std::rand() and friends.

I honestly am a bit at a loss though in terms of understanding how this proposal makes things significantly easier for programmers.

Absent this proposal, you would write:

    std::default_random_engine e(std::random_device{});
    std::uniform_int_distribution<int> uniform_dist(1, 6);
    const int random_value = uniform_dist(e);
The new process basically cuts the first two lines down to one, and uses a method call for a uniform distribution instead of creating an object for it. Is that really easier?

If so, then yeah, go with the idea of an empty constructor version of engines that will smartly seed from a random device, and add a mixin that does the magic of mapping all the different distributions in to methods.

I'm just wondering if that is somehow missing what is actually making life difficult for developers.


See the author's comments on very similar code here:

https://www.reddit.com/r/cpp/comments/31857s/random_number_g...

> One minor issue is that you’re using `default_random_engine`, which in some systems may be a LCG with a tiny 32-bit state. If so, you’ll have an RNG with a tiny period. That’s why Stephan recommends[1] that you explicitly use the Mersenne Twister. Of course, it might be something better, with more state.

> But what’s really wrong with it is that you use a single 32-bit integer to seed the RNG. If you’re using the Mersenne Twister, you’re using four bytes of state to try to seed 624 bytes. It’ll work, but it’s way worse than what the Python code does; Python uses 624 bytes of actual entropy rather than four bytes.

[1]: https://www.reddit.com/r/cpp/comments/31857s/random_number_g...


I think the solution for platforms which use bad default_random_engine's is for said platforms to stop doing that. ;-)

Seriously, the whole point of platform defaults is that they make appropriate choices for the platform. Sure, you can define your own engine, but that's exactly what the original API gives you...


You could say that about a lot of platform defaults. Sadly it's still our responsibility to avoid the bad ones.


Exactly. Working around it in the standard is the wrong way to do it though. You send what the standard requires, and maybe the bad guys are standards compliant, but there is no point in bending over backwards to work around a platform that is going to find a way to mess it up anyway.


Now I finally get it. C++11's seed sequences are not compatible with std::random_device. I'd been using the pre-C++11 boost::random and hadn't realized there was this subtle difference.

That is super annoying.


The current way separates out what will be gotten back from the engine from the actual invocation. In comparison, the proposal lets me look at one line and know what will be returned:

    rng.uniform(1,17)
I know exactly what to expect. Compare that to:

    uniform_dist(e);
Unless the variable is named very well ("UniformDistOneToSix"?), I don't know what that line does.

Not to mention, what if I want to roll a bunch of numbers in a row? Do I create all the different distributions that I may ever need ahead of time? If I don't know what all random ranges I'll need, I guess I can just create them on the stack as needed, instantiating objects willy nilly. Eew.


I'd just call it "from_one_to_six".

I guess if it is really a mystery, you could just always do:

    std::uniform_int_distribution<int>{1, 6}(e);
...and use some typedefs to avoid it being quite so verbose.

I'm not sure I grok the bunch of numbers in a row scenario. Usually in that case I'd imagine you'd want to create a bunch of numbers with a consistent distribution, which is exactly why you have the structure you want. If not, you can always create new distributions in an ad hoc fashion (exactly why it is good that the parameters for a distribution are NOT template parameters). Instantiating distributions isn't costing you anything here. It's just a transient struct with a handful of fields... though if you use it for more than one call, it might somehow be more efficient (doubtful, but conceivable).

In practice, you pretty much always end up with something that holds the engine inside it, and you can always decorate it with methods like:

    int roll_die(){ return std::uniform_int_distribution<int>{1, 6}(e); }
...or alternatively:

    template <typename T>
    T uniform(const T begin, const T end) { return std::uniform_int_distribution<T>{begin, end}(e); }


"To the extent that anyone cares about C++11's random-number facility at all, the C++ community is polarized between two views—one that likes it, and one that hates it."

Isn't that almost a tautology? "People who care about X either love it or hate it".


There's a wide spectrum between love and hate.


Yeah, and all those people probably aren't participating in online discussions about it. I'm pretty happy with most aspects of <random>, but I don't seek out arguments. I just use it.


This is basically the problem with the whole Internet though ;)


No. You can recognize the importance of something and care about it a lot while thinking it's (eg) mediocre. Thinking a problem is important and having an extreme opinion of a solution are entirely different axes.


In my experience, having an extreme opinion of a problem's importance (very important, not important whatsoever) is correlated with having an extreme opinion of a solution (love it, hate it).


> having an extreme opinion of a problem's importance (very important, not important whatsoever) is correlated with having an extreme opinion of a solution (love it, hate it).

I don't think this is correct either. How would thinking a problem is "not important whatsoever" imply that someone must love or hate any potential solution? I couldn't care less how sports team X fixes their losing record, and yet your claim is that that would somehow correlate with me having extreme opinions (love or hate) about their decision to draft Player Y? Very much to the contrary, I'm apathetic about any solution, which is pretty much the definition of finding a problem "not important whatsoever".

Furthermore, I'm not even sure how "extreme opinions of the problem's importance" is relevant to what we're talking about. The original quote was "to the extent that anyone cares about [the problem] at all". That's anywhere from "I care a bit" to "I care a ton", not exclusively "I care a ton".

Finally, even if you weren't very wrong about the premise of what we're talking about: "X is correlated with Y" doesn't mean "X implies Y is a tautology (or close to one) because they're the same thing", which is the original statement I was responding to.


>> correlated

> imply that someone must

Meh, correlation is not causation. Let's include apathy as a third extreme, for both problems and solutions.



Because Microsoft implemented random_device correctly while MingW / GCC hasn't?

Its a bug in MingW. Which isn't a big deal because Microsoft Visual Studio Community Edition is an excellent IDE and compatible with OSS projects now.


Could you point to the gcc bug? Note: the standard does not require a non-deterministic random_device, at least C++11 doesn't, so the behaviour as described in the stackoverflow link would be allowed.


There are a lot of things the C and C++ standards allow implementations to do, such as making int 16 bits, using one's complement for signed integers, (in C) only making the first 31 characters of identifiers significant, (in C++) not allowing any identifiers longer than one character, except those in the standard library, et cetera. This is because they are designed to be portable to a wide variety of compilers and target devices which might have limits that weren't envisioned at the time of writing the standard. Doesn't mean it's a good idea to write absolutely pathological implementations for no real reason other than laziness...


>This is because they are designed to be portable to a wide variety of compilers and target devices which might have limits that weren't envisioned at the time of writing the standard.

That's not really the case here, is it? If anything the bug is in the standard.


> std::random_device may be implemented in terms of an implementation-defined pseudo-random number engine if a non-deterministic source (e.g. a hardware device) is not available to the implementation.

Windows has a non-deterministic source (IE: The function CryptGenRandom). Therefore, this is a bug in MingW's implementation. CryptGenRandom is hard to use, but that's no excuse for MingW's developers to be lazy.

Pointing at a "mistake" in the spec when the intent of the spec is extremely clear is laziness. Pure and simple.


I'd think the autoseed proposal in this post addresses that problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: