Hacker News new | past | comments | ask | show | jobs | submit login
Randomness extractors: making fair coins out of biased coins (bytepawn.com)
105 points by Maro 3 months ago | hide | past | favorite | 52 comments



If you know that a source of randomness contains entropy (unpredictable bits) but you don't know how much (ex. digital camera unless heavily processed will contain random sensor noise in the output) the safest thing to do is pipe it into a cryptographic construct such as a hash or a sponge.

Once you believe you piped enough you use the state of the cryptographic primitive as the seed for further random bit generation. The Linux kernel uses a sponge (to accumulate), hash function (to consolidate) and a stream cipher (to output) to 'convert' events with some degree of randomness into 'infinite' safe cryptographically secure random bits.

To acquire some intuition about this you can imagine taking a raw 1MP photo with a camera sensor and then feeding the lossless file to sha256sum. You acquire a 256 bit string and the sensor noise in the photo will be sufficient to secure the result. An attacker would need to model all the degrees of freedom in taking photos in the world and sensor noise production to build a simulator for your camera and start bruteforcing your sha256 result which will almost certainly (sensor might be compromised or not really be raw) contain far more degrees of freedom than 256 bits.


I remember thinking about something similar to this at university - I was uncomfortable with the use of biases to assign non-zero probabilities to events that fail to occur after some number of trials.

If I flip a coin n times and it comes up heads everytime, what's my best estimate of the likelihood of tails?

It came out as 1/2^(1/(n + 1)); and the chance of heads = (1 - that).

The calculus for results in between seemed intractable to me - or at least well beyond my abilities...

So I threw it into a newton-raphson solver and was happy to see that it came out pretty much linear (the most asymmetrical result will be the one for three trials, and since that was basically linear all results for greater n will be as well).

But I never went quite this far - for that you'd also need to calculate the standard deviation of the probability estimate (I don't think that it would've been much harder than what I did, but it was outside of my requirements at the time, so it was something I never implemented).


For such a question to make sense, don't we have to first define some distribution over how the coin might be biased in the first place?


Interesting question. I would say yes. But, there are subtle biases:

* coin always favours H or T. simple bias

* coin has some component of behaviour which can be on, off or reset. For example a liquid mercury component, which can bias the H or T outcome but the right kind of "flip" resets it to a known-safe mode so the coin has less to no bias.

* coin has bias which only manifests in skilled hands. a particular kind of flip.

The point I'm making is that probably, the bias is always assumed to be H or T favouring, but doesn't admit more complex coin bias where it could be 2 or more actors and 2 or more capable of biassing, and a pigeon who can't (or a dummy, and a pigeon: a good con generally has more people involved than you think)


This is the Bayesian vs. frequentist view point. What the OP is talking about is assigning a Beta(1,1) prior on the distribution and observing n heads in a row would yield a distribution of the bias of Beta(1, 1+n) and the mean of that distribution is 1/(n+2) which means the OP is off by one in the denominator but still good for memory. However, that is if you take the mean as your best estimate of the bias. If you take the mode, then the OP would be satisfied that even the Bayesian approach says that tails would be impossible. The frequentist view would say your best estimate is the average of the observations which would yield a completely unfair coin.


Having a distribution over possible coins isn't related to frequentism vs. Bayesianism, I don't think. In frequentism, you can ask, "If I have a whole bunch of mystery coins with different biases, what is the distribution of their biases, and for how likely am I to flip k heads in a row for each possible bias?" And in Bayesianism, that distribution can just be the prior/posterior distribution for a single mystery coin.


With the simplified form of the binomial for n equal to k:

    ∫(0 -> x)(p^n)dp = ∫(x -> 1)(p^n)dp
    [p^(n+1)/(n+1)](0 -> x) = [p^(n+1)/(n+1)](x -> 1)
    x^(n+1) - 0^(n+1) = 1^(n+1) - x^(n+1)
    2 * x^(n+1) = 1
    x^(n+1) = 1/2
    x = 1/(2^(1/(n+1)))
Was I wrong? Have I been wrong for decades now?


Math is more than symbolic manipulation. You need to explain where the equations come from. The best I can decipher is that you are calculating the median of a Binomial distribution given k=0. Your "error" comes from the fact that you are positing that all biases are equally likely. Thus, you should not be surprised that you are getting a non-zero probability because you yourself are saying they are non-zero.


I'll write some code to test it tomorrow, but why the hell would it be zero?

There are other probabilities that can lead to zero successes - every probability except for one.

And yes, they diminish rapidly as the number of trials goes up (much in the same way as it does in my equation).


You have me confused a little because your original post seems to suggest you wanted it to equal 0:

> I was uncomfortable with the use of biases to assign non-zero probabilities to events that fail to occur after some number of trials.

Anyway, it entirely depends on your world view. When you say "There are other probabilities that can lead to zero successes" then that sounds like a Bayesian framework and that you have to pick your prior on the world. A natural choice is uniform (also Beta(1,1)) and update your prior as you collect data. You would then use the mean/median/mode of your posterior as your estimate for the bias. In your case, it appears you are operating in a Bayesian world but forcing your prior to be constantly uniform despite observing data. The frequentist perspective is that the probability is the relative frequency of observations. In this example, a frequentist would say that the probability of heads is 1 and tails is 0.


> In your case, it appears you are operating in a Bayesian world but forcing your prior to be constantly uniform despite observing data.

Got you now, thanks.


It was twenty years ago, but I think it was n + 1.

I'll redo the maths and check.


I just treated it as a binomial distribution with a fixed but unknown probability.

Take the case of zero successes, that can happen for every probability except for one.

The graph peaks at zero, but if you calculate 'x' such that the integral from zero to x of the binomial function is equal to the integral from x to one, you get a nice centre point.


The insight here is that you need to emit two signals that has equal probability, even if those two signals are rare in the full distribution. In the full distribution, you’re allowed to add any other kinds of signals that aren’t those two.

You then throw out all signals that are not those two signals, and the conditional distribution will renormalise itself to give you a fair coin toss.

You pay for this by throwing out many bits that are not these two signals. The less fair the coin, the more coin flips you throw away.

In the trivial case of a fair coin, you throw away nothing and keep every coin toss. In a biased coin, you throw away any pairs of HH or TT.

Independence is a major assumption underlying any of these models.


> Interestingly, this is the best possible approach, whether we know the bias of the input bit stream or not.

Would love to see a proof of that. It feels a bit unintuitive to me.

(But on second thought that might be because I’m thinking of cases of correlated bits.)


It's obviously false. If you know that the bias is 0.5, you can just pass the input stream through unmodified. You can also construct better codes for other biases.


Depends how you define best, I assume best in the article means ‘output is close to 50/50 and independent’, in which case the 01/10 solution is already optimal given the assumptions.

If you define best as the most efficient at converting inputs to outputs as well as the output being 50/50 then there are better ways such as your example.


OP here. Thanks for the comment. I'm not a mathematician, will double-check my wording/phrasing on this.


OP here.

Re-reading this bit, I agree my wording is very clunky. What I meant by "whether we know the bias of the input bit stream or not" was:

[For p≠1/2] whether you know the value of p or not, this is the best approach. Of course, p=1/2 is the trivial case where we can just pass through..


It's worded quite badly, but I think it's the best you can do given 2 independent bits, except for the rare case where 00 and 11 are equally likely.


That’s a good counter example. :) Thanks.


In general, I think you could show that an 'arithmetic coding' approach is optimal, if the distribution of each bit given the previous bits is known. From there, you'd just have to show that the von Neumann approach for i.i.d. bits is no worse on average.


This is correct, and I came to the comments to say the same thing. It takes some work to implement without arbitrary-precision floating point, but arithmetic coding can make use of the full entropy in the input stream, whereas the approach in the article discards a lot of information.


I had this thought too. But can you be sure that there won’t be any statistical patterns / correlations in the output bitstream with that approach?


Something maybe obvious but worth repeating is that there are 2 kinds of errors: predictable and unpredictable. Bias is the predictable error, it's the direction we are likely to be wrong in. In many practical applications unbiased error is not what we want. If the cost of being wrong in different directions is asymmetric then we want to be biased so that our mistakes are less costly. The unpredictable error is noise. In this example we are trying to create something maximally unpredictable, so the goal is to remove all biases, giving pure noise.


made me think of, "bend entropy towards definable outcomes"

https://m.youtube.com/watch?v=2QJjMwBAORw


I haven't dug deeper, but the claim that the von Neumann approach is optimal does not seem intuitively correct. It seems like you could squeeze a tiny bit more entropy from it -- basically, if you reject two pairs in a row, the nature of that rejection tells you something.

    HT xx -> H
    TH xx -> T
    HH TT -> H
    TT HH -> T


You're correct. There are more sophisticated extractors like Elias' and Peres' that do better, asymptotically achieving the upper bound given infinite data.

See https://peteroupc.github.io/randextract.html


OP here.

Thanks for that, that's obviously better in terms of bitrate, I haven't thought of that. Any two bit sequences with equal probability can be used to emit either a 0 and 1..


It would be interesting to combine this with something that detects bias on a non-deterministic stream. So in one shot, it takes a stream of unknown bias and emits an unbiased stream. The closing paragraph says that’s impossible, but the tradeoff is you are only sure the output is unbiased with some amount of confidence. I think you’d also need a buffer to detect and then output.


This is a specific type of general algorithm/research area called Bernoulli Factories if anyone wants to go deep.


Thanks for this pointer -- I had read a Knuth paper ages ago that talked about this, but I couldn't remember the term of find the paper again, and this led me directly to [1] which led me back to [2]

[1] https://peteroupc.github.io/bernoulli.html

[2] Knuth, Donald E. and Andrew Chi-Chih Yao. “The complexity of nonuniform random number generation”, in Algorithms and Complexity: New Directions and Recent Results, 1976.


Does this work with a guaranteed random bit output bandwidth ?

These techniques (eg von neumann) all seem to suffer from that same problem, namely, they crank out uniformly distributed random bits from biased sources, but with no guarantee of how long you may have to wait for a bit to come out.


For certain input distributions, a guaranteed output bandwidth is impossible. For instance, suppose that you had a function that, given a vector of N integers chosen uniformly from (0,1,2), returns a single unbiased output bit. Then, exactly half of the 3^N input vectors would correspond to each output bit. But this is impossible, since 3^N is always odd, so this (0,1,2)-function cannot exist for any fixed N.

Now, suppose that you had a function that turned N input bits with a 1:2 bias into a single bit with no bias. Then you could use this to implement an (0,1,2)-function as above, by collapsing inputs 1 and 2 into input 1. Since such an (0,1,2)-function is impossible, the 1:2-biased function is similarly impossible.

This argument holds for any i.i.d. input source that can be generated from a uniform-integer source with an odd number of possible values.


If the coin is biased so that it lands on heads only 1 in a million times it will take a lot of coin throws to get some randomness out of it. This isn't a limit of the technique, just a consequence of the low entropy source.


Does it take a fixed long time or a long random potentially-infinite time?


The latter (except that it doesn't always have to be long; the length of time per bit follows a gaussian distribution). If it were the former then randomness would be predictable in some sense. It's similar to the halting problem.


I'm not sure what you mean by "randomness would be predictable". For example, it is possible to do this if you know the bias.


If you can predict precisely how often in an input stream randomness occurs, it's not randomness.


Knowing the bias is not knowing exactly how often each results come out... it is still probabilistic...


Yeah, that's why you can't predict how long you'll wait for a random bit in an entropy extractor.


I give up. Thankfully somebody else in this thread gave an answer.


If you know the bias, then you can surely estimate the bit rate. Of course you could get very unlucky and wait a long time for bits, even with an unbiased underlying source. But that's the nature of these methods.


Or you could pump the heavily biased bit stream into a modern cipher and get a - for all intents and purposes - super high quality, completely unbiased random stream with the same bandwidth characteristics as the input.

And, as a matter of fact, much faster than the input if required.


Things are not so simple: how do you key the cipher? with the same randomness as the input string? If so, then that input distribution could be a bad one for the cipher, in that it might fail to generate a sufficiently scrambled output.

This can be mitigated if you use a unkeyed hash function instead, but even for these we don't have any provable guarantees unless you assume the function is a random oracle, which is an ideal object that cannot exist.


Yeah, if we’re starting with the “1 in a million” unfair coin then most of the input strings are going to be all 0 or all 1 which means the cipher is going to give the same sequence every time. Not useful!



That's a good question. You cannot get a non-statistical guarantee of an output rate from the Von Neumann biased coin toss, because you can potentially get a long run of the same toss result, which stalls output bit generation. That might be a generic limitation of all similar de-biasing approaches. Has that been proven or disproven?


See my sibling reply [0]: for many input distributions, you can never put a fixed upper bound on the number of inputs you need to get a single output bit.

[0] https://news.ycombinator.com/item?id=41549648


The article is interesting, but it misses the most practical and unambiguously safe way to generate streams of random data: Use cryptography.

To generate a stream of random data, use a hash function with arbitrary-length output (XOF) such as blake2x[^0] or shake256[^1]. Make sure your key contains at least 256 bits of entropy. Absolutely never use a key with less than 128 bits of entropy.

Since it's impossible to know how much entropy there is in a key, you probably want to use something like the Fortuna RNG[^2]. Substitute the sha2/AES based construction for your XOF. Bruce Schneier designed Fortuna back when XOFs were harder to come by.

If you want more performance, you can use blake2 to compress your input seed into 256 bits and generate the random stream using chacha20[^3].

All of this is usually handled by the Linux kernel[^4], so it's best to just use the getrandom(2)[^5] system call or just read from /dev/urandom[^6]. If you are writing a Rust program, you can use the rand[^7] crate, which uses a mixed approach reading a seed from the operating system and expanding it in-process using chacha[^8]. This is a valid strategy.

I am omitting some subtleties[^10] about mathematical definitions of randomness extractors as used by the author of the article. When you are using a cryptographic approach, you are dealing with a complexity-theory based security notion[^9], which does not precisely equate to creating a stream with a specific amount of entropy. Everywhere – except for a physics or a math paper dealing with information theory – I would call this a technicality. For most intents and purposes, cryptographic security notions are the most real-world robust conceptions of randomness available.

[^0]: https://www.blake2.net/

[^1]: https://csrc.nist.gov/pubs/fips/202/final (shake256 is part of the SHA-3 standard)

[^2]: https://www.schneier.com/academic/fortuna/

[^3]: https://protonvpn.com/blog/chacha20

[^4]: https://lwn.net/Articles/884875/

[^5]: https://man.archlinux.org/man/getrandom.2

[^6]: https://man.archlinux.org/man/urandom.4

[^7]: https://crates.io/crates/rand

[^8]: https://docs.rs/rand/0.8.5/rand/rngs/struct.StdRng.html

[^9]: https://en.wikipedia.org/wiki/Security_of_cryptographic_hash...

[^10]: In particular, PRFs are not guaranteed to output tokens with a certain amount of entropy – if I recall correctly – because they can map two inputs to the same output.

---

I am the main author of the Rosenpass[^11] post-quantum secure key exchange for WireGuard. My expertise comes from developing this protocol, as well as a couple of years of engagement with the real-world cryptography community and from my own scientific research on cryptography and secure implementations of cryptography.

[^11]: https://rosenpass.eu/


This is a great post with really valuable resources for any practical attempt to use strong randomness, but aren't you missing the whole point of the article?

Surely if you can just use a strong RNG to generate the key for the cryptographic algorithm you could just use that for all your randomness and ignore the stream of input entirely? The whole point of the article is how to extract the entropy from an unknown/untrusted input stream.

It's like the author has presented a recipe for a chocolate cake and you've said "it's better if you already have a cake, then you can just take a slice of that". Well yes.

Or in the domain of the article, faced with von Neumann's algorithm for getting fair flips from a biased coin, your solution amounts to "Instead I just flip my own coin which I know is fair."


I don't think so.

Using hash functions requires a minimum amount of entropy in the seed. So do the schemes put forward in the article. In particular, these schemes require a relatively high degree of certainty about the amount of entropy in the stream at low variation. For the entropy extractors, the amount of total entropy required scales linearly with the length of the output stream. If you are using a hash function, the entropy requirement is constant.

The fact remains, both the method put forward in the post and using hash functions have a minimum entropy requirement. Even if you have only a small amount of entropy, hash functions will still get you more bang for the buck.

If push really comes to shove, you can still use a key stretching function[^0] to make it as hard as possible for an attacker to brute force what little entropy you have, as is routinely done with passwords.

To illustrate the difference in entropy requirement, imagine running the Von Neumann generator for a while, achieving the needed entropy level. In this scenario, the output stream of randomness will be fine. If you then get a section from the stream with very little entropy – much lower than the required amount – you get a section with very little entropy in your output stream. The Von Neumann generator can degrade, hash functions won't (for all practical intents and purposes).

Fortuna is designed to approach the entropy estimation problem. It is eventually secure even in the face of a determined attacker; due to the construction using entropy pools used with exponentially decreasing frequency of use, it covers an extreme range of different entropy levels – around ten orders of magnitude with 32 pools.

Crucially, once a secure level of entropy is reached, the RNG stays secure.

Of course, if the amount of entropy is low, Fortuna will produce quite a bit of insecure entropy before, eventually, producing secure entropy. In practice, this issue can be solved by running Fortuna for a while without producing any output. In a factory producing secure hardware devices, you might just allow the device to be active for a day, and you might some extra, high-quality entropy from the device that flashes your hardware devices in the first place.

Fortuna also allows you to use different sources of entropy securely. Using a Von Neumann generator for instance, achieving this is much harder.

Entropy estimation is borderline impossible in practice; Fortuna deals with this head on, with von Neumann you are lost if your estimate is off.

[^0]: https://en.wikipedia.org/wiki/Key_stretching




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: