Finite State Entropy - A new breed of entropy coder

xavxav · on Jan 11, 2014

This was a nice article.

In the comments Jarek Duda discusses some of the more technical details of the implementation and of his ANS paper that enabled this writeup. One cool thing I noticed is this:

    Another advantage of ANS is that we can slightly perturb     
    the initialization procedure using a pseudo-random number 
    generator initialized with cryptographic key (e.g. and 
    also the number of block): for example choosing between 
    the lowest weight symbol and the second best one.
    This way we get simultaneously a decent encryption for free.

Does anyone know what modern compression tools use for entropy coding? Do they all use huffman and would this help them increase speed? What about gzip for HTTP transactions?

ch · on Jan 11, 2014

There is a nice treatment on entropy coders in Managing Gigabytes (http://ww2.cs.mu.oz.au/mg/). I highly recommend the text as it gives good background on entropy encodings and it also takes a nice engineering approach to implementation.

nova · on Jan 11, 2014

Arithmetic coding, one of the early victims of software patents.

thristian · on Jan 11, 2014

Well, this is interesting.

I toyed around with entropy encoding at one point based on a republished copy of an old Dr. Dobbs' Journal article I found, and I managed to cleanly reimplement the described algorithm in Python and reproduce the example mentioned in the article... but eventually I hit a problem I wasn't smart enough to debug. I'm eager to see whether this new algorithm is more approachable.

cromwellian · on Jan 11, 2014

"In a nutshell, this coder provides the same level of performance as Arithmetic coder, but only requires additions, masks, and shifts."

Didn't IBM's Q-Coder also make the same claims?

Here's another one: http://www.gvu.gatech.edu/people/official/jarek/courses/7491...

Tractor4973 · on Jan 11, 2014

The Howard-Vitter coder only works for binary alphabet (0/1).

Tractor4973 · on Jan 11, 2014

Same of Q-Coder :

"The Q-Coder is a new form of adaptive binary arithmetic coding"

beagle3 · on Jan 11, 2014

Does anyone know the real history with arithmetic encoders?

A lot of places seem to credit Rissanen, but I remember from when this was actually my field, that Elias did most of the work in the early '60s (and actually went on a tangent that Shannon considered in the '50s), and Rissanen's contribution was getting the finite word length arithmetic considerations properly done.

In fact, when I studied this at the university (early 90s), it was introduced as "Elias (aka Arithemetic) coding".

jmspring · on Jan 11, 2014

Glen Langdon, who authored early papers with Jorma Rissanen on Arithmetic Coding was my advisor. I am not sure if I still have notes on what Glen taught in his data compression classes, but I will see what I have.

I've been meaning to get notes from his classes as well as some transform mathematics notes from one of David Huffman's classes online.

eliasmacpherson · on Jan 11, 2014

How does it compare to the algorithm used in FLAC or mpeg4 lossless? Is there a reasonable way to benchmark the differences? https://en.wikipedia.org/wiki/Golomb_coding#Simple_algorithm

Tractor4973 · on Jan 11, 2014

Golomb is way too static, it doesn't compress as well as Huffman, let alone FSE.

vanderZwan · on Jan 11, 2014

So if I understand this correctly, how much it outperforms Huffman greatly depends on the data being compressed. Which makes me wonder: what kind of real-life datasets would benefit the most from this new approach?

jmpe · on Jan 11, 2014

Check out the FiniteStateEntropy/test/probaGenerator.c code.

It builds a table of character sequences based on a probability. If you feed it a p=0.5 you get a sequence that halves each time:

0000000011111223 (shortened example, the real total length is 4096 so you get 2048x0, 1024x1 ...)

This is then used as an alphabet probability to build random sequences in the test file.

The algorithm scores better than huff for higher P, meaning that the alphabet shortens with a few dominant ones. This means that (statistically) you get a lot of words in your dictionary that resemble each other. And that's where the fractional probability plays its role.

To try to answer your question I'd guess that this technique shows its muscle for data sets in which the dictionary is large and consists of similar sequences. Example: a csv file with a lot of numbers, gis or kml datasets, Gerber files, ...