> Huffman coding tries to compress text one letter at a time on the assumption that each letter comes from some fixed and known probability distribution. If the algorithm is successful then we'd expect the compressed text to look like a uniformly distributed sequence of bits. If it didn't then there'd be patterns that could be used for further compression.
This can be gently confusing when you're using different compression systems, (bits vs bytes)
Someone is compressing very large log files. They then compressed the output, and got further reductions in size.
> The fundamental reason is that these highly repetitive byte sequences, with very small and regular differences, produce repetitive compressed sequences, which can therefore be compressed further. - Yann Collet
The same is true of arithmetic coding, which separates the probability model from the encoding process. Feed an arithmetic coder a stream of random bits and it will efficiently sample from your model. See section 6.3: http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/105.124....
I'd suggest diving into discrete probability for a few hours. That should be enough to understand this post. Huffman coding is a little advanced but you can understand the algorithm (not the theory) with just the probability.
This can be gently confusing when you're using different compression systems, (bits vs bytes)
(https://groups.google.com/d/topic/lz4c/DcN5SgFywwk/discussio...)
Someone is compressing very large log files. They then compressed the output, and got further reductions in size.
> The fundamental reason is that these highly repetitive byte sequences, with very small and regular differences, produce repetitive compressed sequences, which can therefore be compressed further. - Yann Collet