Hacker News new | past | comments | ask | show | jobs | submit login

It really depends. I tend to use a specialized compression tool if I need to compress once and send/decompress often, but use zstd when I compress / decompress a lot. In my experience, if you have a fix, small amount of time(single digit minutes or less), zstd is the one that will compress to the smallest size. I even often pick `-3` as it is typically a lot faster than `-4` and subsequent, for not a huge difference in resulting size.

In my experience, if compression time is not a factor, for text (non-random letters and numbers), lzip is the best. I recently had to redistribute internally the data from python nltk, and tried to compress/decompress with different tools, this was my result (picked lzip again):

    gzip -9                 10 m  503 MiB  31 s
    zstd -19                29 m  360 MiB  29 s
    7za a -si               26 m  348 MiB     s
    lzip -9                 78 m  310 MiB  50 s
    lrzip -z -L 9 (ZPAQ)   125 m  253 MiB  95 m



I did some tests myself on a 22MB SQL file and it turns out:

* 7za -m0=PPMd produced the smallest file being faster than bzip2

* bzip2 turned out to be way faster than both lz (684%) and xz (644%) and produced a smaller file

* xz is marginally faster than lz, compressed sizes are about the same with the xz file being a tad smaller

* without any switches 7za produces an archive a bit bigger than xz and lzip in about the same amount of time

* gzip and zst produce about the same compressed size, only zstd is a lot faster (517%) than gzip

The 7z file was produced using the -m0=PPMd switch. For the other files no command line switches were supplied. Here are the file sizes:

  23668150 file.sql
   3899477 file.sql.7z
   4149962 file.sql.bz2
   5954982 file.sql.gz
   4540628 file.sql.lz
   4506720 file.sql.xz
   5961291 file.sql.zst


When going for smallest size, it'd be interesting to see your comparison using command lines switches for best compression (makes a big difference, both in terms of time and size).

Was bzip2 slightly, or considerably slower than zst?


Bzip2 being slower than gzip, yes, it's also considerably slower than zstd. Yet zstd -19 produced a bigger (4.3M) file in about the same amount of time.

If I can remember correctly zstd = 0.2s, gzip = 0.8s, 7zip (PPMd) = 2.1s, bzip2 = 2.7s, lzip, xz, 7zip (lzma) = 15..16s. This is CPU time from memory, might not be fully accurate.

I'd say zstd and gzip is better suited for general use, while bzip2 and 7zip (PPMd) are better suited for high compression of text files.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: