Hacker News new | past | comments | ask | show | jobs | submit login

Can someone explain why this is not possible? I understand why sending a decompressor beforehand is not possible for all inputs. I don't understand this formulation of the problem, where it only needs to work for one input that you get before you need to create the decompressor.



Some simple explanation

Compression exploits redundancy in a data stream (basically). You basically get "all symbols" (and how you define this varies according to your compression method: you could do all letters in the case of text, or even text snippets that repeat, etc) and reassemble them in a way that the ones that repeat the most take less space (and you also need to start from a basic dictionary known by all uncompressors or ship it with your compressed file)

One simple analogy is writing with abbreviations, but if you write e.g. the reader has to know what "e.g." means or you have to put in the beginning "e.g. = example" (and this also takes space)

Now, a randomly generated file ideally has all symbols repeating with the same frequency, (we say all symbols have the same entropy - I'm not sure about this exact wording), hence you can't take a symbol that repeats more or less and make it take less space in your compressed file


what if instead of using your own dictionary, you use an index into an existing dictionary? such as an index into a subsequence of pi. Couldn't you then find a sequence of bytes in the file in which the index into pi takes less bytes and then replace them all with the index? If you couldn't find any in pi use e or another such number? What am I missing


In this case your dictionary either doesn't have everything or to adequately point to it you take as much space as not using it.

While Pi has all pairs of 2 digits, your index would take more space than storing the pairs itself (because you might need to go beyond position 99)

For one situation you might "get lucky" and find a coincidence, but this won't scale generically


See also: https://github.com/philipl/pifs and https://github.com/philipl/pifs/issues/23

I admit this fooled me for a bit. Good news is, I won't be fooled again by something similar :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: