This looks very interesting. Before everybody starts making the obvious comparisons, note that this is from the same guy that made LZ4, and he is someone who clearly knows what he's doing. I've been following his work for some time.
It looks like this is the evolution of Zhuff, an experimental (closed-source) compressor [1]. It is basically LZ4 followed by a fast entropy coder, specifically FSE [2], that is a flavor of arithmetic coding that is particularly suited for lookup-table based implementations.
From a quick look at the source code it seems that the entropy stage uses 3 probability tables, one for literal bytes, one for match offsets, and one for match lengths. This is not dissimilar from gzip (which however uses Huffman).
EDIT: from a second look it seems that the LZ77 compression stage is basically LZ4: it uses a simple hash table with no collision resolution, which offers very high compression speed but poor match search. I'm surprised he didn't implement (yet?) an HC variant as for LZ4, it could even beat gzip compression rate with no overhead in decompression speed/memory requirements.
I wish some of these provided static, preset dictionaries. I'm working on a packet capture system for SIP, which has a similar layout as HTTP. Thus all the common header fields and values are perfect for a preset dictionary, and in fact, the compression doesn't even need to keep any more state than that dictionary. That is, packets share more with the preset dictionary than with each other.
LZ4 allows you to "prime the stream" as it were, but I'm not sure it is really made for this scenario. As far as I can tell, I'd have to essentially have separate compression/decompression calls for each packet, resetting the state to the dictionary between each packet.
What one needs is a preset dictionary. Have a hash of it during compression and then require a preset dic on decompress that has the same hash. Should be a straightforward extension but it is a format change. I think preset dic would be useful in a lot of contexts.
But the preset dic functionality I've seen, in say, LZ4, really is just the saved state of a normal compression run. So once you start compressing, the dictionary eventually evaporates as more data does in and backreferences can no longer point back that far into the dictionary. That's fine if all your content compresses well, but if the dictionary is a far superior state...
It looks like this is the evolution of Zhuff, an experimental (closed-source) compressor [1]. It is basically LZ4 followed by a fast entropy coder, specifically FSE [2], that is a flavor of arithmetic coding that is particularly suited for lookup-table based implementations.
From a quick look at the source code it seems that the entropy stage uses 3 probability tables, one for literal bytes, one for match offsets, and one for match lengths. This is not dissimilar from gzip (which however uses Huffman).
EDIT: from a second look it seems that the LZ77 compression stage is basically LZ4: it uses a simple hash table with no collision resolution, which offers very high compression speed but poor match search. I'm surprised he didn't implement (yet?) an HC variant as for LZ4, it could even beat gzip compression rate with no overhead in decompression speed/memory requirements.
[1] http://fastcompression.blogspot.fr/p/zhuff.html
[2] https://github.com/Cyan4973/FiniteStateEntropy