Show HN: Birdseed – A silly way to get random numbers by hashing tweets

rajington · on Nov 16, 2015

i know most people talk about the technical aspect of things, but i got to hand it to you, that's an amazing name

ryansworks · on Nov 17, 2015

Thank you! It was better than my second choice which was "Bird of Pair-a-Dice"

masonhipp · on Nov 16, 2015

I always love seeing clever ways to be random. Hadn't considered this one but it's a pretty good idea, and I'm sure there are even better ways to run with the basic concept (maybe analyzing color values of recent pinterest or instagram photos).

If you can find a private source of community-driven-randomness that'd be even better.

jdmichal · on Nov 16, 2015

I wonder whether pictures would be a good source. I could see strong leanings towards blue (sky, water) and green (vegetation), and maybe also grey (concrete, asphalt). Would be interesting to do a collective histogram over a wide array of images and see if colors tend to average out or if there's clear peaks.

bmm6o · on Nov 17, 2015

There's still plenty of room for enough entropy to power most use cases. Even two people trying to take identical pictures should have enormous differences when analyzed at the pixel level.

ComNik · on Nov 16, 2015

Maybe relevant: https://news.ycombinator.com/item?id=8203144

ryansworks · on Nov 17, 2015

That's a very intriguing idea, I hadn't considered the possibilities of looking at public/private photos and sampling color values across to get some randomness. Private is certainly the key. With something like Birdseed, the randomness is totally public just as it is when getting randomness from atmospheric data. If someone figures out which wisps of clouds you are sampling or what search term you are using on Twitter, then the jig is up!

cpeterso · on Nov 16, 2015

Other sources of randomness are https://en.wikipedia.org/wiki/Special:Random or https://news.google.com. The URLs could be parameterized to request a random language, too.

cooper12 · on Nov 17, 2015

Well for the Wikipedia link you'd just be piggybacking off their random algorithm rather than consuming organic entropy. Regardless, they have some information on how the page is chosen here: https://en.wikipedia.org/wiki/Wikipedia:FAQ/Technical#random

ljk · on Nov 16, 2015

maybe that'll be the next in the "twitch plays" series!

binarymax · on Nov 16, 2015

This is great! I did an experiment a long time ago, before Twitter closed their JSONP API, to make brownian motion visualizations with a similar concept.

http://binarymax.com/brownian_2.gif

I'll have to find some time to recode it to use this service :)

mikeskim · on Nov 16, 2015

i wrote a paper about leveraging public random streams like this one. it can be downloaded here https://drive.google.com/file/d/0B9IkyvYlZZe7TldTRGlSMnpQX0U...

mtdewcmu · on Nov 16, 2015

Stock market data does seem like a better source of entropy than Twitter queries.

ryansworks · on Nov 17, 2015

I would tend to agree with you, though my tune might be different if I were an investment advisor!

UncombedCoconut · on Nov 17, 2015

I love it! Quick idea -- though I'm not sure if this would further the goal of having fun, even as a PR -- there's a documented way to make Birdseed subclass random.Random (https://hg.python.org/cpython/file/2.7/Lib/random.py#l72) and inherit the familiar interface and fancy methods.

grubles · on Nov 17, 2015

To reiterate: "This is for fun. It's not secure. Don't use it in production :)"

huntaub · on Nov 17, 2015

I know that this has a disclaimer that isn't to be used in production. But, just know that hash functions are not there to provide randomness. There is no guarantee that a hash will be statistically indistinguishable from random noise.

benchaney · on Nov 17, 2015

There is no guarantee that any PRNG will be indistinguishable from random noise but a properly designed hash function will be as close as any.

huntaub · on Nov 17, 2015

Sorry - guarantee was probably too strong. I meant it in the mathematical sense (no more than a negligible chance of distinguishing from random noise) [1].

[1] https://en.wikipedia.org/wiki/Cryptographically_secure_pseud...

mikeash · on Nov 17, 2015

There are a bunch of CSPRNGs built out of hash functions. Are they all doing it wrong?

huntaub · on Nov 17, 2015

Of course, I'm sure that it is possible to construct one out of a hash function. I am also pretty confident that they do more than just serve the raw bytes of the hash.

mikeash · on Nov 17, 2015

I think you're technically correct that there's no guarantee, in that it's not part of the definition of a hash. A magical function which somehow returned an incrementing counter value for each unique chunk of data you fed to it, globally, would fit the definition of a cryptographic hash.

Real-world cryptographic hash functions, however, just try to approximate a random oracle. They attempt to achieve pre-image resistance and collision resistance by making their output look random. Certainly that's the case with SHA-224, which is what this code uses.

Some real-world CSPRNGs do just use hash functions directly. Linux's /dev/random implementation, for example, just returns a SHA-1 hash of its entropy pool contents. Yarrow (used in Mac OS X, iOS, and FreeBSD) does a final pass on its output using a block cipher, but requires that the hash function used in its earlier stages produce random-looking output. Fortuna is similar.

Of course, this code is insecure and should not be used in production, regardless of the internal details, simply because all of the inputs are known to a third party i.e. Twitter.