This looks very interesting. Before everybody starts making the obvious comparis...

MichaelGG · on Jan 25, 2015

I wish some of these provided static, preset dictionaries. I'm working on a packet capture system for SIP, which has a similar layout as HTTP. Thus all the common header fields and values are perfect for a preset dictionary, and in fact, the compression doesn't even need to keep any more state than that dictionary. That is, packets share more with the preset dictionary than with each other.

LZ4 allows you to "prime the stream" as it were, but I'm not sure it is really made for this scenario. As far as I can tell, I'd have to essentially have separate compression/decompression calls for each packet, resetting the state to the dictionary between each packet.

tandemstrong · on Jan 25, 2015

LZ4 can do that.

There is a function, called LZ4_decompress_safe_usingDict() which seems to match your objectives.

In case of doubt, you should ask directly the author, at : https://groups.google.com/forum/#!forum/lz4c

bhouston · on Jan 25, 2015

What one needs is a preset dictionary. Have a hash of it during compression and then require a preset dic on decompress that has the same hash. Should be a straightforward extension but it is a format change. I think preset dic would be useful in a lot of contexts.

MichaelGG · on Jan 25, 2015

But the preset dic functionality I've seen, in say, LZ4, really is just the saved state of a normal compression run. So once you start compressing, the dictionary eventually evaporates as more data does in and backreferences can no longer point back that far into the dictionary. That's fine if all your content compresses well, but if the dictionary is a far superior state...