Compress data more densely with Zopfli

JoshTriplett · on March 1, 2013

A quick summary of the difference between this and existing implementations of deflate such as gzip or zlib:

Deflate combines two compression techniques: Huffman coding to replace each value with a string of bits whose length depends on the frequency of that value, and backreferences of the form "go back N bytes and repeat M bytes from there". (Glossing over a pile of fun interactions between those two, such as Huffman-coding both literals and backreferences, and Huffman-coding the Huffman tables. Not to mention the ability to have M > N, repeating the earlier parts of the thing you backreferenced. See RFC1951, https://www.ietf.org/rfc/rfc1951 , for the full details.)

Existing deflate implementations use various heuristics to guess when they should encode the next bit of data as a literal symbol or a backreference, such as a hash table of possible strings to backreference; the compression level (-1 through -9) tunes those heuristics to take more or less time, most notably by changing the lengths of strings stored and searched for for in the hash table.

Zopfil says it uses "more exhaustive compression techniques" based on "iterating entropy modeling and a shortest path search algorithm to find a low bit cost path through the graph of all possible deflate representations." The entropy modeling allows Zopfil to estimate the effectiveness of a given approach to deflate. The path-search algorithm effectively treats the space of all possible deflate representations as a single-player game, with "moves" that change the representation in some incremental way, and then uses standard search algorithms to find a near-optimal representation.

(Without reading the source in detail, I don't know whether the search space includes the choice of Huffman tables, the set of possible backreferences, or both. I can see how either one could map onto a search space, though.)

kevinconroy · on March 1, 2013

In case you're wondering how to try it, the site lacks a simple getting started guide. Sure it only takes a few minutes to get to this, but for others looking to try this with <30 seconds of work, do this:

  git clone https://code.google.com/p/zopfli/
  cd zopfli
  make
  chmod +x zopfli
  ./zopfli <FILENAME>
  Optional: Copy it to /usr/local/bin (or your favorite path location)

eknkc · on March 1, 2013

Not a scientific experiment or something like that, just tried it on uncompressed jQuery for fun;

Original file: 268381 bytes

Zopfli (with -i1000) 75730 bytes in 950ms

Gzip (with -9) 79388 bytes in 30ms

It's really slow but the difference seems significant enough. (They have more data here: https://zopfli.googlecode.com/files/Data_compression_using_Z...)

duskwuff · on March 1, 2013

Importantly, though, the decompression speed is the same as deflate (e.g, gzip). It's only slower during compression.

Jach · on March 1, 2013

As I suspected, XZ performs better at least for jquery. I was disappointed it was missing from benchmarks.

For comparison on my machine:

zopfli -i1000 takes [real, sys]=[1069ms, 21ms], 75730 bytes.

xz -9 takes [148ms, 15ms], 69476 bytes.

(Though as mentioned as a pro of zopfli, xz would lose on decompression speed for a larger file.)

andrewf · on March 1, 2013

Yeah but the point here is to produce a .gz file which is compatible with standard tools like gunzip, PNG decoders, your browser's handling of "Content-Encoding: gzip", etc.

If you control the compressor and decompressor then applying new tricks to producing a Deflate stream is far less interesting, just use a better compressor.

Jach · on March 1, 2013

Can we get browsers to support LZMA? :) (Hey, a bug report! https://bugzilla.mozilla.org/show_bug.cgi?id=366559) But yeah, zopfli does seem like an excellent fit when you can't force a particular decompressor.

EwanToo · on March 1, 2013

and bzip2 gets it down to 65195 bytes.

I've not used xz, I'm sure it's good, but the point isn't really about which compression algorithm is the best in the world, but which works best in the (depressingly limited) real world conditions.

SageRaven · on March 1, 2013

If it is any indicator, current versions of FreeBSD and some Linux distros have "xz" as part of the base system, along with their gzip and bzip2 counterparts, including xzcat/xzless/etc., newsyslog/logrotate compression options for xz, xz-compressed tarballs for source distributions (tar.xz / txz), as well as binary packages for FreeBSD compressed with the format.

LZMA compression has gained a lot of traction over the past few years.

EwanToo · on March 1, 2013

Thanks - I meant to update my comment after I went and looked up xz, I've used 7zip and LZMA before, but I've never directly come across a .xz file, odd!

Yver · on March 1, 2013

Tried compressing the minified jQuery, got:

  32653 gzip -9
  31763 7z -tgzip -si -mx=9 -mfb=258 -mpass=16
  31691 zopfli -i1000

IOW, it yields a much smaller benefit as data gets harder to compress. Of course this is not at all surprising.

sciurus · on March 1, 2013

For my tests I used the default compression levels, so 6 (out of 1-9) for gzip, 6 (out of 1-9) for xz, and 15 (out of 5-1000) for zopfli. My input was the linux kernel 3.8.1 source tarball. This was run on a macbook pro with OS X 10.8.2 and a 2.2 GHZ i7 processor.

  time    size    format
  n/a     987640  uncompressed
  18.44   210896  gzip
  281.44  143128  xz
  3604.21 200640  zopfil

ck2 · on March 1, 2013

gzip with 32KB dictionary and 128 word size

code.jquery.com/jquery-1.9.1.js

  original  268381

  gzip       76070

  zopfli     75730

Zopfli only has 340 bytes savings and 20x time increase.

Not worth it.

moonboots · on March 1, 2013

Zopfli can be used with nginx without code changes with the static gzip module [1].

[1] http://wiki.nginx.org/NginxHttpGzipStaticModule

magnetikonline · on March 1, 2013

Absolutely love this Nginx module - use it all the time in production. For me, dropping in Zopfli into the "pre compress asset phase" will be trivial.

ck2 · on March 1, 2013

You can already get 3% more compression out of zlib by giving it a bigger dictionary like 7zip can optionally do.

code.jquery.com/jquery-1.9.1.js

  original  268381

  gzip       76070      (32KB dictionary, 128 word size)

  Zopfli     75730

340 bytes is not worth a 20x time increase for compression.

afhof · on March 1, 2013

340 bytes is not worth 20x time increase. 0.4% decrease in file size IS worth the time. If you can measure the amount of time it takes to download those last 340 bytes, then you can start to count how many downloads it takes before its worth it. (If download speed was constant, it would be about 224 downloads).

PNG seems like a good candidate for this style compression. A constant time initial cost for recurring savings in bytes transmitted seems like an excellent use of CPU time to me.

As an aside, I ran the Zopfli program with 10000 iterations on jquery-1.9.1.js:

  268381 jquery-1.9.1.js
   75622 jquery-1.9.1.js.gz

ryalfalpha · on March 1, 2013

"A constant time initial cost for recurring savings in bytes transmitted seems like an excellent use of CPU time to me."

Wish more people understood this aspect of web scale! well said :)

CJefferson · on March 1, 2013

A fairer (in my opinion) comparison, with minimised jquery:

    92629 jquery-1.9.1.min.js
    32666 (35.26%) gzip -9
    31691 (34.22%) zopfli with standard compress
    31688 (34.20%) zopfli with max compress

ck2 · on March 1, 2013

     31783 gzip with 32KB dictionary and 128 word size
     31764 above with filename stripped

I am too lazy to strip out the filename so there is a few bytes penalty, so it's more like 31764

So in this case it's

    31688 
      vs
    31764

76 bytes

I guess the real question is what is the decode penalty time if any?

monochromatic · on March 1, 2013

It is if you serve the file a hundred billion times.

ck2 · on March 1, 2013

If that's your logic and proprietary compression engines are okay despite not being in any browsers, then 7zip ultra can get it to 69531 bytes? That's 5000+ bytes lower than Zopfli

bzip2 can get it to 65536 bytes which is a whopping 10k smaller than zopfli

Oh wait, I see that zopfli is deflate compatible so that means browsers can still decode it. Fascinating.

GhotiFish · on March 1, 2013

I wonder if there's any chance of browsers supported xz? Maybe?

keeperofdakeys · on March 4, 2013

There is a compression-time, decompression-time trade-off whenever you use compression. Xz (or LZMA) is interesting in that its decode time is quite small compared to its encode time, unlike others like bz2. For large, static content the trade-off is quite good. However, with smaller files, like css and javscript sources, you really need to do some benchmarking to be sure.

On-demand xz will probably never be worth it however, the encode time just takes too long. You would have physically transferred the data by the time it has finished compressing.

jlebar · on March 1, 2013

I was just talking about this with some other Firefox developers today, actually. So...yes, there's a chance. Not a very big one, mind you. But a guy can dream!

csense · on March 1, 2013

Apparently only gzip and zlib compatibility is included in the code.

Does anyone know if there are programs out there that allow recompressing zip or png format files with this library?

andrewf · on March 1, 2013

The "PNG crunch" programs out there - eg OptiPNG - actually do a bit more work. They'll try palette-ising a truecolor image, applying all the PNG filters (http://www.w3.org/TR/PNG-Filters.html), etc.

They also tend to use forked versions of zlib, often to expose some more internal knobs to trial-and-error on, so a zlib-compatible wrapper around zopfli wouldn't necessarily just drop in.

But I'm sure support for zopfli will appear sooner rather than later :)

JoshTriplett · on March 1, 2013

Ideally, you'd want a zlib-compatible zopfil library that you could just substitute for libz.so.1 when running an appropriate format-specific compressor that uses zlib.

ArchD · on March 1, 2013

It would be good to see some comparison with bzip2, which typically has better compression ratio than gzip.

wmf · on March 1, 2013

FYI, bzip2 has been obsoleted by xz which is both faster and gives better compression. But as others have said, Google doesn't care because browsers do gzip.

pjscott · on March 1, 2013

In the future, browsers could also do xz; they'd just need to advertise it in their Accept-Encoding header. Google could put this in Chrome any time they like, and nothing would break.

dbaupp · on March 1, 2013

As Jach points out above, Mozilla has had a bug[1] open about it for 6 years.

[1]: https://bugzilla.mozilla.org/show_bug.cgi?id=366559

newman314 · on March 1, 2013

There's also sdch.

lern_too_spel · on March 2, 2013

Importantly, gzip is twice as fast as xz at decompression.

rorrr · on March 1, 2013

xz is essentially LZMA2. It definitely beats bzip2:

http://pokecraft.first-world.info/wiki/Quick_Benchmark:_Gzip...

DeepDuh · on March 1, 2013

As a Swiss I'm curious how the name has come to be. Is the team related to ETH Zurich?

Scaevolus · on March 1, 2013

Lode Vandevenne wrote the blog post and the code. LinkedIn says he works at Google Zürich.

DeepDuh · on March 1, 2013

Thanks a lot. I'm still reluctant to use LinkedIn since much of it feels like extortion to me.

Btw. I'm surprised that you've found the ü character. Not even US customs/border protection let me enter my German name in the ESTA form.

roel_v · on March 1, 2013

It's actually right in the post that he's based in Zurich.

DeepDuh · on March 1, 2013

I could be wrong, but I think it was edited in. Otherwise scaevolus would probably have pointed it out instead of going to hunt LinkedIn.

mhroth · on March 1, 2013

I guess, Zopfli might be Zöpfli, which is a kind of braided Swiss bread. http://en.wikipedia.org/wiki/Zopf

supervillain · on March 2, 2013

Zopfli sounds like a pharmaceutical drug.