> However, if you have to wait for the previous block to compress the following ...

spullara · on Aug 24, 2016

So the lossage there is that you don't get to compress and generate that dictionary at the same time. Interesting.

pwg · on Aug 28, 2016

There's no "generating" of a dictionary. The gzip algorithm is based upon the "dictionary" being the last seen 32k bytes of uncompressed data based upon where in the file the compressor is presently working (technically in compression circles it is a 'windowed' algorithm, not a 'dictionary' algorithm). It compresses by finding a repetitive string in that 32k window that matches a string at the current location, and outputting an instruction that says (in effect): "seek backwards in the uncompressed data 200 bytes, then copy 100 bytes from there to here".

So as long as each parallel block has pre-pended the final 32k of the prior block, the output from the compression algorithm will be identical between a straight sequential compression and a pigz parallel compression. Because at byte 0 of the current block, the 32k window of the uncompressed prior block is available to perform matching against, just as if it were running sequentially.

The only growth from pigz comes from needing to round each parallel compressed block up to a multiple of 8 bits (the huffman codes that are output are bitstrings that don't match with 8-bit byte boundaries). But worst case that is 7 bits per block for each parallel block. Given the performance gains on multiple CPU systems, a few hundred bytes net increase is not likely to matter. If those few hundred bytes did matter, then one should use bzip2 or lzip or xz and get much higher compression ratios (at the expense of much longer time).