Hacker News new | past | comments | ask | show | jobs | submit login
100% Compression Using Pi (github.com/philipl)
87 points by gburt on Nov 8, 2013 | hide | past | favorite | 32 comments



It's always wonderful to see solid implementations of really terrible ideas.


I hate to sound overly optimistic about these types of things, but this kind of fun and ridiculous stuff is what inspires real innovation as well.


That reminds me of the ending of the novel/movie "Contact" by Carl Sagan. When calculating pi in base 11, there was an implausibly long run of 0's and 1's, which made a pretty picture when you plotted it in two dimensions. So if you entered that picture as your file, the file system would make a major discovery :-).

Someone actually wrote a paper on the properties of these numbers: http://arxiv.org/abs/1209.2348

This is the second time in a week I'm mentioning Carl Sagan in a comment. I'll stop doing that now.


You can never mention Carl Sagan too much, only too little.


Try watching a Cosmos marathon. The hyperbolic introductions are inspiring until you watch them back to back.


    > All possible finite files must exist within π.
    > The first record of this observation dates back to 2001.
I'm surprised to hear this, since I had a buddy in high school in the early 90s who insisted on the same thing. Even then I could see the problem was storing the offset & length of the desired sequence would be worse than storing the sequence itself. This friend was brilliant, but he had a history of crazy half-joking ideas. He insisted the natural numbers don't exist, since you have to cross an infinite range of real numbers to get from 0 to 1.


It's not a good compression algorithm because the offset into pi will require more digits to represent than the data you are trying to compress in the first place. Quite a lot more considering the law of big numbers. So ultimately this algorithm is going to expand the size of your data by some enormous factor.


No, you just have to store the offset as well. And the offset for the offset. Then you just have to keep track of the number of offset cycles you've gone through - and you can store this as well.

Keep doing this until you have a number that is smaller than your file...

might take a while, but it would be very elegant.


considering the first offset is much, much larger than the file you started with. the offset to the offset is just going to increase in size with each iteration.


1. There is no proof that pi is a normal number. 2. The index of the position takes up space.


1. Well, here's to hoping.

2. He addressed this - you store the index in pi as well. If the index of the index is too big, you can store that too... indices all the way down!


You then have to store how many levels down the real, as opposed to index, data is. If you have looped enough to have reached an index that is small, the depth count will on average be so large that it takes about as much space as your original data.


1. If everyone uses this algorithm maybe we'll find out if pi is normal or not!

2. Shhh don't let the pie hear you.


1. Well, that would be a hard bug to reproduce right?

"Saving this file (attached) takes an infinite amount of time."


http://www.angio.net/pi/

In 100 millons of digits of pi there is less than 1% of chances of finding a given 10 character string. Probably mankind wasnt on earth enough time to have calculated yet the amounts of digits that would imply finding there something big with high chances.


Very nice!

I think we should have more April Fools along the year. My favorite "project" so far is still RFC2549.


Brilliant. But this is dangerous stuff in the wrong hands. If the NSA gets hold of it, BANG!, everything is metadata. Even you are coded in many codes many times along any expansion of pi... You can now be considered metadata.


Here's an idea for data storage inspired by this project: 1. Encode the data into a floating number R 2. Take a steel rod, assuming its length is L, make a mark on the rod at the distance of L*R from one end To read the data, just measure the location of the mark and back out R. Not as genius but isn't it still great? :)


The article is obviously a joke, but it's worth mentioning that the square root of any non-perfect-square number would do as well as Pi. All of them contain infinite sequences of apparently perfect randomness.

Also, not only is it true that any imaginable sequence appears in Pi, but it appears there an infinite number of times.


If anyone's interested, I wrote up a tangentially similar project last year. It was a fun use an afternoon ;)

http://e1ven.github.io/HaShrink/


Does this mean Pi contains copyrighted material ? Good luck DMCA !


I was wondering what this post could be about when I read the title and was pleasantly surprised when I clicked on it. It gave me a good laugh - we'll done OP.


"In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π."...love it!


You could go another level meta, and compress the location of the actual data in π by finding the location of the location in π.


Keep reading... he suggests just that :)


I'm willing to invest if you have patents.


The patent is already written, but I can't tell you the offset.


similarly, we could just store files as pointers to books in the Library of Babel ( http://en.wikipedia.org/wiki/The_Library_of_Babel ) - since there are books there that have the base64 encoded versions of your files as well


The bigger the file, the lower the probability of finding an equivalent number (in pi) in this lifetime.


When selecting a transcendental filesystem, I usually go with φfs. It just looks better.


It's great. You can store the offset in terms of pi too


Genious




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: