Lossless decompression and the generation of random samples

DanBC · on Aug 8, 2013

> Huffman coding tries to compress text one letter at a time on the assumption that each letter comes from some fixed and known probability distribution. If the algorithm is successful then we'd expect the compressed text to look like a uniformly distributed sequence of bits. If it didn't then there'd be patterns that could be used for further compression.

This can be gently confusing when you're using different compression systems, (bits vs bytes)

(https://groups.google.com/d/topic/lz4c/DcN5SgFywwk/discussio...)

Someone is compressing very large log files. They then compressed the output, and got further reductions in size.

> The fundamental reason is that these highly repetitive byte sequences, with very small and regular differences, produce repetitive compressed sequences, which can therefore be compressed further. - Yann Collet

hadronzoo · on Aug 8, 2013

The same is true of arithmetic coding, which separates the probability model from the encoding process. Feed an arithmetic coder a stream of random bits and it will efficiently sample from your model. See section 6.3: http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/105.124....

Chris2048 · on Aug 8, 2013

I simply don't understand this, I don't think I have the right background. Any good starting places for finding more about this topic?

susi22 · on Aug 8, 2013

I'd suggest diving into discrete probability for a few hours. That should be enough to understand this post. Huffman coding is a little advanced but you can understand the algorithm (not the theory) with just the probability.

DanBC · on Aug 8, 2013

I have an alpha version of a text about lossless compression here (http://danbcstupidideas.neocities.org/compression.html)

It's very rough and I'm not a good writer. It's probably horribly patronising. I welcome any corrections and suggestions and etc.

gtani · on Aug 8, 2013

this is 100 pages into MacKay's text, which is well paced

http://www.inference.phy.cam.ac.uk/itprnn/book.html