I always love seeing clever ways to be random. Hadn't considered this one but it's a pretty good idea, and I'm sure there are even better ways to run with the basic concept (maybe analyzing color values of recent pinterest or instagram photos).
If you can find a private source of community-driven-randomness that'd be even better.
I wonder whether pictures would be a good source. I could see strong leanings towards blue (sky, water) and green (vegetation), and maybe also grey (concrete, asphalt). Would be interesting to do a collective histogram over a wide array of images and see if colors tend to average out or if there's clear peaks.
There's still plenty of room for enough entropy to power most use cases. Even two people trying to take identical pictures should have enormous differences when analyzed at the pixel level.
That's a very intriguing idea, I hadn't considered the possibilities of looking at public/private photos and sampling color values across to get some randomness. Private is certainly the key. With something like Birdseed, the randomness is totally public just as it is when getting randomness from atmospheric data. If someone figures out which wisps of clouds you are sampling or what search term you are using on Twitter, then the jig is up!
Well for the Wikipedia link you'd just be piggybacking off their random algorithm rather than consuming organic entropy. Regardless, they have some information on how the page is chosen here: https://en.wikipedia.org/wiki/Wikipedia:FAQ/Technical#random
This is great! I did an experiment a long time ago, before Twitter closed their JSONP API, to make brownian motion visualizations with a similar concept.
I love it!
Quick idea -- though I'm not sure if this would further the goal of having fun, even as a PR -- there's a documented way to make Birdseed subclass random.Random (https://hg.python.org/cpython/file/2.7/Lib/random.py#l72) and inherit the familiar interface and fancy methods.
I know that this has a disclaimer that isn't to be used in production. But, just know that hash functions are not there to provide randomness. There is no guarantee that a hash will be statistically indistinguishable from random noise.
Sorry - guarantee was probably too strong. I meant it in the mathematical sense (no more than a negligible chance of distinguishing from random noise) [1].
Of course, I'm sure that it is possible to construct one out of a hash function. I am also pretty confident that they do more than just serve the raw bytes of the hash.
I think you're technically correct that there's no guarantee, in that it's not part of the definition of a hash. A magical function which somehow returned an incrementing counter value for each unique chunk of data you fed to it, globally, would fit the definition of a cryptographic hash.
Real-world cryptographic hash functions, however, just try to approximate a random oracle. They attempt to achieve pre-image resistance and collision resistance by making their output look random. Certainly that's the case with SHA-224, which is what this code uses.
Some real-world CSPRNGs do just use hash functions directly. Linux's /dev/random implementation, for example, just returns a SHA-1 hash of its entropy pool contents. Yarrow (used in Mac OS X, iOS, and FreeBSD) does a final pass on its output using a block cipher, but requires that the hash function used in its earlier stages produce random-looking output. Fortuna is similar.
Of course, this code is insecure and should not be used in production, regardless of the internal details, simply because all of the inputs are known to a third party i.e. Twitter.