The design approach of just iterating until it passes a suite of statistical tests is ... not the best.
The issue is that the common suites of statistical test don't include every credible statistical test at their level of complexity.
Essentially as you retry against the tests you expend their utility, producing a result that is guaranteed to pass them as a byproduct of your process. ... but may in fact be extraordinarily weak to yet another reasonable statistical test (and/or broken in some practical application) that no one has yet thought to include in the test suite.
Very true. Sadly, this is the main approach the non-cryptographic PRNG community has at the moment.
On the plus side, those tests are fairly extensive and brutal nowadays. They detect anomalies in cryptographic algorithms that were still used a decade ago, such as arc4random().
But one thing I hint at at the end is that I hope, and anticipate, to see more accurate quality measurement tools in the future.
One alternative would be to attempt to select an analyzable structure which you could show through analysis should have good randomness properties. Potentially add a limited number of tweaks or parameters with the aid of just one or two general statistical tests (and perhaps a collection of pants-on-head stupid tests just to avoid wasting time on stuff that is utterly broken).
After that, you try with the complete suite and unless the result reaches a predetermined passing criteria you declare the design as flawed and go back to step 1 rather than continuing to tweak (as the result of continued tweaking will inevitably be a pass even if the design is poor).
Unfortunately, high performance and even 'good random-like behavior' may be at odds with having a construct which is transparent enough to be able to justify its performance from analysis rather than running it. :(
One thing I did when designing random functions is run them against DieHarder and Practrand, but then also against SMHasher, you just have to use some construction to fold the key into the state. I think that's a useful somewhat orthogonal way to test. have you considered this?
On the other hand, some tests could be aligned enough with the needs of a specific application and prove that a PRNG is good, not only probably good.
For example, if a game uses a PRNG to "stretch" a random seed for procedural generation purposes conditional probabilities are important (to avoid spurious correlations between features) but period is completely irrelevant (even a relatively short period would be many orders of magnitude larger than output size).
Yes, this seems like a relatively fundamental flaw. Even if the confidence of these statistical tests is something like p=0.000005, throwing enough experiments onto them almost ensures you get a false positive.
We know the tests are overfitting. That's not really a problem we can worry about too much; the tests can't be perfectly comprehensive and also finite. The problem is that we're using to use the model to tell us something about adversarial input.
Depends on which method is utilized. “Ping ponging” would have issues as you’d run into all the problems of all of them.
Depending on how the “blending” is done and whether the algorithms are truly independent of each other, combining them is definitively more secure than any of them individually. The Linux kernel let’s you do this through writing to /dev/random. The written bytes get added (in part) to the entropy pool.
> Indeed, AVX2 has a bit of a quirk where regular registers (%rax and the like) cannot directly be transfered to the SIMD ones with a MOV; it must go through RAM (typically the stack), which costs both latency and two CPU instructions (MOV to the stack, VMOV from the stack).
I don't think this is quite correct. You can move efficiently from r64 to XMM/YMM without going through memory with MOVQ/_mm_cvtsi64_si128. I'm not able to look into it more closely right now, but these links should give insight:
My vague recollection is that you might be right that this is clumsy with AVX2. Maybe it's a case where you have to take advantage of the fact that XMM1 and YMM1 are same register, and cast the YMM to XMM? Or just drop to inline assembly.
But thanks for the writeup, and I hope to be able to look at Shishua more closely at some point in the future.
Yes, most instructions that modify only the bottom element(s) didn't get a ymm (256-bit) version since it would serve no purpose as it would produce the same result as the xmm one, and the corresponding intrinsics mostly follow the same pattern. So there is no int64 -> ymm intrinsic.
> Cast vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
That seems a bit beyond their mandate since what the compilers generate is mostly up to them, and in fact it doesn't seem true: at -O0, both gcc and clang generate a few extra instructions for the cast. With optimization on, it's all good though.
It is not directly in the article, but in a link to a tweet by djb, the creator of ChaCha8. He believes that the cpb listed in the Randen comparison is off:
He mentions that perhaps the implementation of ChaCha8 for the benchmark is done by hand and unoptimized. And it is true from what I saw that a lot of benchmarks with ChaCha8 are implemented with none of the tweaks that make it fast.
i found it quite funny. And I don't really find calling a well know eugenics advocate racist to be even remotely provoking. Maybe out of style compared to the rest of the text, but apart from that I didn't find it conspicuous in the least.
CSRNGs are mentioned a few times in TFA, but cryptographic security doesn't appear to be among the design criteria for SHISHUA. There's no mention of linear or differential crpytanalisis, which is table stakes for a CSPRNG.
I added a warning in the GitHub’s Readme. SHISHUA is not to be used for cryptographic purposes.
You mentioned the lack of cryptanalysis; there is also the lack of rounds (which prevents researchers from breaking partial versions to ease their study, and allows setting a security margin).
Shuffling 100 million points per second (probably much more if it wasn't both, disc IO and CPU->GPU transfer limited), and shuffling in general if you want to maintain high throughput. e.g.: https://bit.ly/2zcUzJq
This is not a bad thing, and maybe I'm alone in this, but I spent my first minute thinking over whether "Shishua" was some kind of romanized mandarin or mandarin-esque word!
(I speak cantonese, though, so maybe mandarin speakers wouldn't fall in this trap)
It's a "stone brush" 石刷 shíshuā. (Alternatively "It's a number!" 是数啊 shì shù a, but then pinyin orthography requires writing it as "Shishu'a" with an apostrophe in front of the third syllable.)
SHISHUA is a strong design, suitable for many use-cases. But the decision of which PRNG is right for you depends on platform, context, comfort, and ease of access.
If you are making a simple video game, it might be fine to just use your standard library; or to write one of the simple ones that you memorized and that don’t have too terrible an output.
If your video game has more severe stakes, if it is the backbone of a virtual economy in a massively multiplayer game where servers feed on a large amount of randomness that needs high quality to ensure players don’t find flaws and abuse the system for their own gain, maybe SHISHUA is right for you.
If you are targeting a platform that does not support 64-bit computation, SHISHUA can’t do it well, and other PRNGs such as sfc32 can be good there.
If you are working on machine learning that feeds on heaps of randomness over the course of months of computation, and that need quality to ensure an unbiased, even learning, SHISHUA can be right for you.
The issue is that the common suites of statistical test don't include every credible statistical test at their level of complexity.
Essentially as you retry against the tests you expend their utility, producing a result that is guaranteed to pass them as a byproduct of your process. ... but may in fact be extraordinarily weak to yet another reasonable statistical test (and/or broken in some practical application) that no one has yet thought to include in the test suite.