Hacker News new | past | comments | ask | show | jobs | submit login

I've often thought about why the default implementation of many randoms around programming languages is to use LSFRs, MTs, and other fast RNGs in the 2020s.

It seems to be better to err on the side of 'people dont know if they want a PRNG or a CSPRNG' and switch the default to the latter with an explicit choice for the former for people that know what they need :)




> It seems to be better to err on the side of 'people dont know if they want a PRNG or a CSPRNG' and switch the default to the latter with an explicit choice for the former for people that know what they need :)

That’s exactly what we did in PHP 8.2 [1] with the new object-oriented randomness API: If you as the developer don’t make an explicit choice for the random engine to use, you’ll get the CSPRNG.

Now unfortunately the hard part is convincing folks to migrate to the new API - or even from the global Mt19937 instance using mt_rand() to the CSPRNG using random_int() which is already available since 7.0.

[1] https://www.php.net/releases/8.2/en.php#random_extension


OpenBSD had a similar problem with people calling arc4random and getting RC4 randomness, but they just changed it to use ChaCha20 anyway and backronymed it to "a replacement call for random".

https://man.openbsd.org/arc4random.3


Recently I started using a new golang library that generated random IDs for different components of a complex data structure. My own use case had tens of thousands of components, and profiling revealed that a significant chunk of the time initializing the data structure was in crypto/rand's Read(), which on my Macbook was executing system calls. Patching the library to use math/rand's Read() instead increased performance significantly.

In addition to math/rand being faster, I was worried about exhausting the system's entropy pool for no good reason: in this case, the only possible reason to have the ID's be random would be to serialize and de-serialize the data structure, then add more components later; which I had no intention of doing.

Not sure exactly how the timing of the changes mentioned in this blog compare to my experience -- possibly I was using an older version of the library, and this would make crypto/rand basically indistinguishable from math/rand, in which case, sure, why not. :-)


Do note that "exhausting the system's entropy pool" is a myth - entropy can't run out. In the bad old days, Linux kernel developers believed the myth and /dev/random would block if the kernel thought that entropy was low, but most applications (including crypto/rand) read from the non-blocking /dev/urandom instead. See https://www.2uo.de/myths-about-urandom/ for details. So don't let that stop you from using crypto/rand.Read!


once you somehow got 256 bit of entropy it will be enough for the lifespan of our universe.


> this would make crypto/rand basically indistinguishable from math/rand, in which case, sure, why not. :-)

It's closer to the other way around. crypto/rand was not modified in any way, its purpose is to expose the OS's randomness source, and it does that just fine.

math/rand was modified to be harder to confuse with crypto/rand (and thus used inappropriately), as well as to provide a stronger, safer randomness source by default (the default RNG source has much larger state and should be practically impossible to predict from a sample in adversarial contexts).

> I was worried about exhausting the system's entropy pool for no good reason

No good reason indeed: there's no such thing as "exhausting the system's entropy pool", it's a linux myth which even the linux kernel developers have finally abandoned.


Others have already addressed the impossibility of exhausting the system entropy pool, however, I would add that you can buffer Read() to amortize the cost of the syscall.

Also, make sure that your patch does not introduce a secure vulnerability as math/rand output is not suitable for anything security related.


One of the better arguments for using a CSPRNG (here, ChaCha8) is that they benchmark it within a factor of 2 of PCG. The state is still comparatively large (64 bytes vs 16), but not nearly as bad as something like mt19937 or the old Go PRNG. (If the CSPRNG was much much slower, which is generally true for CSPRNGs other than the reduced-round ChaCha variant, it becomes a less appealing default.)


How did you get to 64 bytes of state? Last I looked, Go's ChaCha8 implementation had 300 bytes of state. Most of that was spent on a buffer which was necessary for optimal amortized performance.


That's correct - the state is 300 bytes (36 uint64 + 3 uint32). https://go.dev/src/internal/chacha8rand/chacha8.go


Fair enough. I was just thinking of base ChaCha state without the buffering. 300B is still significantly better than mt19937 (~2.5kB) or the old Go generator (4.9kB!).


I know posting "I agree" is not generally welcomed on here, but ChaCha8 is really underappreciated as a MCMC/general simulation random number generator. It's fast, it's pretty easy on cache, and it won't bias your results, modulo future cryptanalysis.


Whats are other cases where you would need the former? I can only think of fixed seeding for things that need reproducible results (e.g tests, verifications).

I think theres another little force that pushes people towards the PRNG even when they don't need seeding: CSPRNG api always includes an error you need to handle; in case the sys call fails or you run out of entropy.

I'm curious how often crpyto.Rand read fails? How much random do I have to read to exhaust the entropy of a modern system? I've never seen it fail over billions of requests (dd does fine too). Perhaps a Must/panic style API default makes sense for most use-cases?

Edit to add: I took a look at the secrets package in python (https://docs.python.org/3/library/secrets.html) not a single mention of how it can throw. Just doesn't happen in practice?


> CSPRNG api always includes an error you need to handle; in case the sys call fails

A user-side CSPRNG — which is the point of adding a ChaCha8 PRNG to math/rand — performs no syscall outside of seeding (unless it supports reseeding).

> you run out of entropy.

Running out of entropy has never been a thing except in the fevered minds of linux kernel developers.


> Running out of entropy has never been a thing except in the fevered minds of linux kernel developers.

Linux used user input and network jitter to generate random numbers, not a pure pseudo random number generator. For a perfectly deterministic pseudo random number generator entropy is only required for seeding and even then you can avoid it if you have no problem with others reproducing the same chain of numbers.


Cryptographically-secure PRNGs are also deterministic, but as long as you have at least 256 bits of unpredictable seed, the output remains unpredictable to an attacker for practically forever.

Linux used/uses user input and network jitter as the seed to a deterministic CSPRNG. It continuously mixes in more unpredictable bits so that the CSPRNG can recover if somehow the kernel's memory gets exposed to an attacker, but this is not required if the kernel's memory remains secure.

To reiterate, running out of entropy is not a thing.


The difference between “I don’t have enough entropy” and “I have enough entropy to last until the heat death of the universe” is only a small factor.

Attack on the RNG state or entropy is much more of a risk. The entropy required is not a function of how much randomness you need to generate but how much time the system has been running.


Thankfully, Go is considering making crypto/rand infallible, because as it turns out the syscalls do not actually fail in practice (it's not possible to "run out of entropy"): https://github.com/golang/go/issues/66821


> CSPRNG api always includes an error you need to handle (in case the sys call fails or you run out of entropy).

> Perhaps a Must/panic style API makes sense?

Yes, CSPRNGs APIs should be infallible.


I like the approach of “all randomness on a system should come from a csprng unless you opt out”. It’s the stronger of two options where you lose a small amount of perf for a much stronger guarantee that you won’t use the wrong rng and cause a disaster. It’s a shame that this is still a sharp edge developers need to think about it pretty much all languages.


In 2024 that's the right answer. But the programming world does not move as quickly as it fancies it does, so decades ago when this implicit standard was being made it would be tougher, because the performance hit would be much more noticeable.

There's a lot of stuff like that still floating around. One of my favorite examples is that the *at family of file handling functions really ought to be the default (e.g., openat [1]), and the conventional functions really ought to be pushed back into a corner. The *at functions are more secure and dodge a lot of traps that the conventional functions will push you right into. But *at functions are slightly more complicated and not what everyone is used to, so instead they are the ones pushed into the background, even though the *at functions are much more suited to 2024. I'm still waiting to see a language's standard library present them as the default API and diminish (even if it is probably impossible to "remove") the standard functions. Links to any such language I'm not aware of welcome, since of course I do not know all language standard libraries.

[1]: https://linux.die.net/man/2/openat , see also all the other "See Also" at the bottom with functions that end in "at"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: