Hacker News new | past | comments | ask | show | jobs | submit login
Crystallization robot generates true random numbers via chemistry stochasticity (cell.com)
68 points by bookofjoe on Feb 23, 2020 | hide | past | favorite | 49 comments



My point of view on this is from a career as a protein x-ray crystallographer, not a mathematician nor computer scientist. I've watched countless tiny crystal dishes, the vast majority of which produce no useful crystals. And while it seems frustratingly random, (especially when faced with months and years of failure), in retrospect from a successful series there are always conditions which likely make causative sense: changes in temperature, stocks of substituent additives, shifts of pH due to carbon dioxide in the air, vibration (or lack thereof) from nearby construction or robotic systems, amount of pollen or air pollution to add nucleation sites. Some crystals will only form along the edges of the crystal dishes; perhaps because there's some complementary dimensions of the molecular surface there. It's a rich list of possibilities, but all would speak against this being as 'random' as say, the exact time of the emergence of an x-ray photon from our source. Nonetheless it's a very creative use of a lot of otherwise non-useful data from vast arrays of robot observed crystallization dishes.


Ah this is really interesting, it didn’t occur to me while reading the paper, I just assumed crystals grow reliably.

Because they’re running sha512 on the crystal data, the output will always look random. But if the crystals have a bias in their preferred growth patterns, especially if there’s a non-trivial probability that they don’t grow, knowing that may be an attack vector, you can use that information to simulate the process and possibly produce the same output.

“A raw binary sequence was then generated by assigning 5 bytes per crystal pixel based on size, orientation, and color, with crystals ordered from top left to bottom right in the vial region (see Supplemental Information for details). These bytes were concatenated with those of adjacent crystal pixels and of subsequent crystals to generate a long binary sequence. This sequence was then split into sections of 64 bytes, and the sha512 algorithm was applied to each of these sections. The resulting binary sequences were concatenated to form a larger-output binary sequence.”

The sha512 is the reason the output is uniform, but if I were doing this, I’d throw a time stamp and maybe some calls to drand48() or something into the pre-sha mix, to guarantee that the input can’t be easily replicated.


As a former biophysicist (I made crystals once but nmr was more my thing) who is now a programmer - I had the same sentiment.


Things like this, i.e. getting random numbers from some real-world process that is supposedly perfectly random, come up every now and then and I keep repeating: There's no real problem in cryptographic random number generation that such a system solves. We have problems with RNGs, but they usually come down to either bugs, boot entropy problems on embedded devices or using an insecure rng for security purposes. None of that is solved by any form of physics or chemistry based RNG.

The claims themselve are also kinda very odd: "Encryption capability better than Mersenne Twister pseudorandom number generator". Huh? Really? You're better than a system that we know is insecure and that nobody should use for cryptographic contexts?


Simple boson sampling, if you can get a technique that maps photons to RNGs in "O(1)" fashion, could conceivably provide a high-throughput infinite source of highly-entropic randomness. And it's probably only needed in <<< %1 of use cases ;)

But understanding how Nature computes randomness is quite enlightening. Just yesterday morning on a hike I saw some long spikey ice crystals recently formed on a creek bed and the patterns were more aesthetically pleasing than any geometric computer art. In the example regarding crystallization, it is an absolute mystery how the nucleating solid particles will choose their exact equilibrium compositions out of the infinite range of possibilities. And then communicate that quantity to the far reaches of the liquid-solid forming interface!


I loves me some diffusion limited aggregation!

https://en.wikipedia.org/wiki/Diffusion-limited_aggregation#...

>High-voltage dielectric breakdown within a block of plexiglas creates a fractal pattern called a Lichtenberg figure. The branching discharges ultimately become hairlike, but are thought to extend down to the molecular level.


Nature doesn't compute random numbers, it has underlying processes either random or so opaque as to be effectively random.


Except for quantum mechanics. We can compute probabilities perfectly, but the actual outcome of any one event (e.g. the destination of any one photon) is unpredictable.

Any attempts to explain quantum mechanics as being deterministic require faster-than-light communication and are generally seen as unlikely [1] (unless we are in a simulation and there's a global random number generator, of course).

You can buy PCIe cards exploiting this for random number generation [2]

1: https://en.wikipedia.org/wiki/Hidden-variable_theory

2: https://www.idquantique.com/random-number-generation/product...


The Many-Worlds Interpretation is a local (no FTL) and deterministic theory, though it works differently than what most people think of when they hear "deterministic theory of physics". Every possible outcome of a random quantum event occurs. (Technically the outcome space is continuous instead of discrete, but it's easier to talk about discrete outcomes.) If a human observes a random quantum event (which happens constantly), then there will be a separate version of the person for each outcome who observes that outcome.

However, even with a deterministic universe, there's still true randomness from the perspective of people in the universe. If you flip a quantum coin (that's appropriately set up to have a 50% chance either way and its output is dependent on some quantum random events), then it's impossible even in concept to predict which way it will land, because after the flip, there will be one version of you that sees it land heads and another version of you that sees it land tails.


What's your definition of unpredictable & random?

I'm interested because you write off quantum mechanics as unlikely to be deterministic. When everything being deterministic is more realistic and from scientific understanding of systems working in nature.

Contrary to determinism would be true magic. Having a force without any cause. That just feels like lack of understanding.


> Contrary to determinism would be true magic.

I used to think like this, that somehow underneath the apparent randomness of quantum mechanics there is a deterministic process that is either being sampled too slowly or too granular to see its fundamentally discrete nature.

Unfortunately that is not the case. No experiment has ever measured a significant deviance from truly random for a quantum-mechanical experiment. We have no better description for the outcome of quantum experiments than the probability distributions that come out of the Schroedinger equation.

Welcome to theory from 80 years ago, backed by every single experiment ever done in the intervening time.

The bottom of our universe is math manipulating probability equations, and we can't see beneath it.

Sorry. I was hopeful too.


> Unfortunately that is not the case.

The thing is we cannot say it's not the case. I used to think like you as well.

That's the disturbing problem I witness with people writing off quantum mechanics as not being deterministic. Magic as an adult is known to be trickery and only seems to be magic when lacking understanding of what forces is resulting in the trickery.

Super determinism is a situation where we could be actually witnessing determinism from quantum mechanics but similar to what we believe is true randomness and cannot be observed for the forces causing it.

The only thing I'm hopeful for is witnessing in my life a way to explain how something cannot be deterministic with real understanding. Besides people just thinking it must be random and if I spend hours all day thinking about this it seems like it cannot happen because it's an impossibility.


> Besides people just thinking it must be random and if I spend hours all day thinking about this it seems like it cannot happen because it's an impossibility.

It's not just people thinking it must be random or it isn't random. Literally every single experiment ever conducted has failed to turn up any evidence of a deterministic process underlying the apparent randomness of quantum mechanics. We can't keep this hope alive indefinitely just because our thinking is warped by everyday experience appearing classical and discrete. We have absolutely no reason to demand our universe be deterministic, and every reason to conclude it isn't. We need to reject the idea that our universe is intuitive to human minds.

For example, Bell's Theorem rules out local hidden variables. General Relativity says spacetime is warped in non-visualizable ways and that simultaneity is not even universal. That should be enough of a mindfuck for us to discard our puny human understanding of things, yet we don't.


Bell's theorem doesn't rule out super determinism. So it's kind of whatever you want to believe (kind of situation) when it comes to not being able to have proof one side or the other. I upvoted your post btw.


There's no such thing as a random number.

A random number generator is a random (number generator), not a (random number) generator. This is a critical distinction!

Radioactive decay is a random (number generator), and there are others.


In my understanding, randomness cannot be "computed"... if you can compute randomness then it isn't truly random.


It all depends on how much entropy you have available. A PRNG can fool any randomness test given a large enough entropy source. We typically use PRNGs with very low entropy, e.g. multiplicative inverse PRNGs, which cycle after not too long.


That and a good entropy source can be had with 2 transistors: https://ieeexplore.ieee.org/document/7111269

(no login copy http://www.chaotic-circuits.com/wp-content/uploads/2016/06/S...)

Entropy generation isn't hard, even at boot, even on embedded systems (this circuit is cheap enough to stick on most boards if you really need it. Or there are $0.50 ICs that do this and a lot more!). And once you have the entropy, you feed it to your normal fast-key-erasure CSPRNG, and have effectively infinite quantities of random bits.


Interesting. This may be a dumb question, but I wonder what happens if you run an EDA simulation on this circuit. Would it still be random?


Chaotic systems are usually described as having "strong sensitivity to initial conditions", but this almost undersells them.

Even incredibly small deviations from the simulation (measurement error, over/undervoltage conditions, floating point errors/rounding, probably even temperature fluctations) would be enough to alter the results. You can see some neat examples of this with double pendulums, which seem like they should be easy to predict/model, but are absolutely not!

While this is popularly known as the "butterfly effect", but I recently learned it was originally called the much less poetic "sea gull effect".


Yes, it's still chaotic. You'll get the same results every time with no parameter variation, but any variation creates dramatically different results. The circuit exploits sensitive dependence on initial conditions, things like temperature and manufacturing variations between the transistors. EDA simulators fix these conditions by default, but you can do things like iterate over a range of values for them, and see that the results change dramatically with each change in input.


A fun thing that I haven't seen anyone try is to deliberately induce metastability into a small circuit. Anyone who's played with digital logic knows that underpowering/overclocking will lead to unpredictable behavior. Just build an oscillator and aim for an unstable clock speed (beyond the spec'd setup/hold delays) with an inadequate power source.


There are already hardware random number generator products that are better (https://en.wikipedia.org/wiki/Hardware_random_number_generat...). A key point is to reduce energy consumption. Instead of aiming for an unstable clock speed (which requires a lot of power), it suffices to sample the very small leakage current from some circuit at high precision.


Unstable clock speeds and power have zero relationship when you design the circuit. You can easily design something with a few transistors that will violate setup and hold times at 1-10mhz, which you can get out of a very low power crystal. You could even use an off the shelf mcu with a low nominal clock speed and just point a fast resonator at it. You can even make things unstable by giving it too slow of a clock speed. Most semiconductors don't want to hold a charge difference for very long.


How about having multiple independent nodes receive a random number from an external source? Like the arrival of incoming gamma ray bursts or something? That would be useful for blockchain I imagine.


This is one question I have always come across when working with crypto systems.

Why is it impossible for one to have an algorithm that generates true cryptographically secure random numbers?


Deterministic perfect random generation requires infinite memory.

One way to see it is this: a perfect bit generator should output a 1 half the time on average. (Let’s not talk about variance for this.)

Out of all sequences of two bits, it should output 11 a quarter of the time. It should output 111 an eighth of the time.

Assume you have N bits of state, and some computation on it produces N random bits and changes the state. Then, at best, a single initial state will give you a sequence of N consecutive ones. (If multiple initial states give you that output, then some other output of N bits cannot be output, which is a deviation from true random output.)

So the longest consecutive sequence of ones you can output is 3×N-2 (with the previous output being zero followed by N-1 ones, and the next output being the reverse).

Therefore, you cannot output a sequence of 3×N-1 consecutive ones, which is a deviation from true random generation that can be statistically detected, given a long enough output.

In some sense, physical processes that output true randomness rely on the large amount of information stored in the environment, and that environment’s diffuse state scrambling, that is presumably extremely hard for an attacker to detect.

For instance, the fine trajectory of electrons attracted from atom to atom through an electrical circuit causing minuscule delays, or the chaotic motion of gaseous atoms, or stronger yet, quantum behavior of particles.


> Why is it impossible for one to have an algorithm that generates true cryptographically secure random numbers?

The answer alternates between "it's not" and "depends on how exactly you define algorithm".

With algorithms we usually talk about deterministic instructions. In such a sense an algorithm can't create random numbers, because it'll always output the same thing with the same input.

But give a few random inputs (which there are usually plenty in any complex enough system) we can build an algorithm that will take these random inputs and output an endless, secure stream of random numbers. Which is what every modern operating system does. You don't need any magic external random number source for that, just use what your OS gives you.


Interesting. What "degree of complexity" in the system is a "complex enough system"?


Good questoin, it's hard to give a definite answer, but usually "if it has a harddrive and a network card you're unlikely to have any problem". Rule of thumb: Normal Desktops and Servers are unlikely to be problematic.

Really the only realistic problem is that if you generate a cryptographic key right after booting the device and if the device is something like a router. Look here [1] for some research on that. That scenario can be avoided by newer APIs (newer linux kernels have the getrandom system call for that) that only will spit out random numbers once the RNG has been initialized with enough randomness - but in return you may get services blocking at boot.

[1] https://factorable.net/


There is specialized hardware to generate a lot of RNG.


Silicon Graphics developed a specialized hardware random number generator in 1997, covered under U.S. Patent 5,732,138: "Method for seeding a pseudo-random number generator with a cryptographic hash of a digitization of a chaotic system."

https://en.wikipedia.org/wiki/Lavarand

https://www.wired.com/2003/08/random/

https://patents.google.com/patent/US5732138A/en


Yeah, e.g. quantum based RNGs. Which are also pretty useless.


They are useful in some cases. But it's not for everyone.


What we're reading here isn't an article intended for computer scientists. It's reporting some findings that may be of interest to chemists. The "true random numbers" thing was pretty clearly tacked on by someone who's heard a bit about encryption - enough to know it requires high quality randomness - but who isn't a computer scientist. They notice that these kinds of crystallisations are random and write a paper. Not really useful to most of us here on HN but no big deal and maybe interesting for other reasons.

Now, your wider point isn't really true. Physics based RNGs are extremely useful. They can be embedded in CPUs which immediately solves several problems:

1. the "no entropy on first boot" problem for embedded devices and newly installed cloud hardware

2. the prevalence of bugs in software RNGs, often caused by attempts to optimise around performance problems elsewhere e.g. Debian, Android have both completely broken their crypto RNGs in the past. The complexity of managing entropy in virtualised environments with multiple levels of nesting (e.g. hypervisor, VM, processes) is quite extreme. HW RNG is simple because you go straight to the CPU, without any software in between. The level of validation on CPU circuits is vastly higher than for operating systems.

e.g. back in 2006 a paper came out showing there were multiple vulns in the Linux kernel RNG: http://www.pinkas.net/PAPERS/gpr06.pdf

3. Hardware RNGs can be extremely fast and generate huge quantities of randomness without any overhead. This matters in some cases like Monte Carlo simulation.

3. Most importantly, they can can be used by secure enclaves.

The enclave use case isn't obvious so it's worth dwelling on. Technologies like SGX let us run software with a new and very strong threat model, namely, that everything outside the enclave is malicious. That means the kernel, the peripherals, the memory, the bus, system firmware like UEFI or SMM code. Everything.

This is obviously very useful if you want to run calculations on other people's hardware without trusting them i.e. in the cloud. But for it to work the enclave must be able to generate encryption keys without relying on the kernel to have collected entropy for it (as we don't trust the kernel!).

On Intel chips this is possible because there's a hardware, physics based RNG in the chip itself. The entropy is collected from a meta-stable oscillator circuit that is measuring thermal noise from the silicon itself. The circuit schematics are public and were audited by Cryptography Research Inc, who have a strong name in the field of hardware security. There's an article on the theory here:

https://spectrum.ieee.org/computing/hardware/behind-intels-n...

Of course AMD and ARM have something similar.

Some people wonder if a CPU RNG can be trusted, but I think this concern is based in a mis-understanding of what modern CPU microcode is capable of. If your CPU is malicious then any output from your computer cannot be trusted. Enclaves let you do computations if other parts of the machine are malicious (with caveats) but ultimately, the CPU is the heart and brain of the device. Worrying about whether a hardware RNG is trustworthy is about as useful as worrying if any other part of the CPU is trustworthy: the answer may be 'no' but you have no way of knowing, and if you're worried about the US Government your only alternative is to switch to an ARM processor that was fabbed by a company without much exposure to the US. Not many of those are around.


Maybe every time they reboot, computers should run "fsck -t random /dev/random" to check the integrity of their random number generator devices.


Can you describe the vulnerabilities in that Gutterman paper, along with how important they were? What changes did Linux have to make to address them?


For instance the RNG state is wiped at boot, and relies on the distribution to use scripts to preserve some entropy then restore it across reboots. Some distributions didn't do this, in particular, OpenWRT didn't do it. On boot its RNG pool was simply the time of day. So the RNG was always being reinitialised to some low-entropy state.

Another problem is that the kernel RNG was (at that time, not sure what the current state is like):

- Very poorly documented, such that the authors had to reverse engineer it because the docs were scarce or wrong

- Very bad at extracting entropy from an idle system. On a server mouse and keyboard events don't really happen, network interrupt timings were "not commonly used as an entropy source" the primary source of entropy was hard disk events. These turned out to provide almost no entropy. Something on the order of 16 bits every ten minutes.

- Because the RNG really wanted to use hard disk interrupts and nothing else, OpenWRT had a further issue beyond the pool being reset on boot: it had a read only flash FS that didn't contribute entropy, so was using only network interrupts to seed its RNG pool. But network interrupts on a wireless router can be observed by an adversary.

- In pathological cases like an SSD-only serving device without entropy provided at the factory, the entire state of the RNG pool might have come from the user typing in their password. As the algorithm used lacked forward security that would mean an attacker could theoretically extract the password by just requesting random numbers. However they didn't actually demonstrate this attack.

So I'd say the vulnerabilities were fairly important albeit they didn't really turn them into a real world attack.

I don't know how the kernel RNG addressed them. The RNG has always had some weird ideas - e.g. the way re-injecting stored entropy at startup didn't raise the entropy estimate so lots of programs would hang when trying to generate random numbers from /dev/random and stuff like that.


So you insist "solves no real problem" and in the very next sentence you name few such problems :)


Erh no. I say it doesn't solve real problems and then I list real problems with RNGs that aren't solved by it.


How would a device based on natural stochastic processes not solve boot entropy? I thought that was exactly how such devices were used now, except most users don't even have one.


I think the point is that cheaply manufactured devices based on natural stochastic processes are already a solved problem, and a new wacky process involving chemistry doesn't bring anything worthwhile to the table.


Here Hanno rejected all hardware devices:

> None of that is solved by any form of physics or chemistry based RNG.


Boot entropy? That is neatly solved by HWRNG.

Bugs/insecurity? These are indeed better with HWRNG. HWRNG are more likely to fail catastrophically (that is easy to detect if you don't use whitening) and easier to physically audit (there are much simpler options than this contrived crystallization device).

Algoritmic RNGs are more likely to fail undetectably and dangerously, and much harder to audit.


The main point of this was to see if can establish a workflow that produces random numbers. Then by changing the nature of the materials / crystal growth I wanted to see if the data tells us about order / disorder transitions in the material. That is one idea anyway.


There are much easier ways to generate true random numbers. Cover any webcam and read out the lowest bit information of every frame, it should be full of noise. This gives you a ton of truly random data.


If you're using your camera for random numbers, make sure you're talking to the raw device directly, and not getting an image that's been heavily massaged, like with an iPhone's HDR / Deep Fusion cameras. And use a lens cap instead of pointing it at a Lava Lamp, because that's patented!

https://www.wired.com/2003/08/random/

>Now Noll is working with Cooper on an improved RNG called LavaRnd (which debuted in May at www.lavarnd.org). The new process replaces the lava lamps with a more Zen-like source of entropy: a webcam with its lens cap on. The chaotic thermal "noise" emitted by the webcam is digitized and put through a hash algorithm that churns the number set, stripping unwanted sections of predictability. The result is a cryptographically strong sequence of numbers, ready for use in the real world. And because the new service is open source, patent-free, and license-free, anyone will be able to cheaply build and operate a LavaRnd server and receive the precious commodity free of charge - a random act of kindness.

Careful with that lower bit, Eugene:

The pseudo-random number generator in the C library was really bad in earlier versions of Unix. It still is bad, and it used to be patronizingly called "simple", but now the title of the OpenBSD manual page finally recognizes rand for what it is:

https://man.openbsd.org/rand

    NAME

        rand, rand_r, srand, srand_deterministic — bad pseudo-random number generator
It was so bad in SunOS 4.1.3 that the rand(3) manual NOTES section coyly understated: "The spectral properties of rand() leave a great deal to be desired."

So bad that the lower bit would actually alternate between 0 and 1 every time you called it, so if you used "rand() & 1" to flip a coin, it would flip perfectly back and forth between heads and tails forever.

https://www.freebsd.org/cgi/man.cgi?query=rand&sektion=3&man...

    NOTES

        The spectral properties of rand() leave a great deal  to  be  desired.
        drand48(3)  and random(3)  provide much better, though more elaborate,
        random-number generators.

    BUGS

        The low bits of the numbers generated are not very random; use the mid-
        dle bits.  In particular the lowest bit alternates between 0 and 1.
http://cpp.indi.frih.net/blog/2014/12/the-bell-has-tolled-fo...

>Nothing about rand()‘s behaviour is required. rand() could legally return the same value over and over every time you call it. It could have a period in the single digits; or return alternating odd and even numbers (and one implementation actually did this!). There have been some implementations that are legendarily bad. The C11 standard actually says some implementations are known to produce sequences with distressingly non-random low-order bits, and advises against using it for serious purposes.

Bash and other shells build on top of that badness, and manage to make the bad randomness infinitesimally worse, by not allowing repeated random numbers, for some inexplicable reason. (See: Cargo Cult Programming)

https://nullprogram.com/blog/2018/12/25/

>Like Bash, repeated values are not allowed. I suspect one shell got this idea from the other.

    do {
        cur = (rand() >> rand_shift) & RANDMASK;
    } while (cur == last);
>Who came up with this strange idea first?

https://en.wikipedia.org/wiki/Cargo_cult_programming


If journal editors were less easily wooed by technical stuff they have no idea about and were able to bring in specialist reviewers, this would not pass. Unfortunately in specialized journals, if you publish something that sounds overly complex and highly technical for the reviewers of the journal, there is a good chance the paper will go through unarmed, except for picky comments about misplaced commas or not citing the work of the reviewer that is only remotely related...


Maybe some environmental factors will tend to give you certain type of entropy. How can you be sure?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: