Quantcast
Viewing all articles
Browse latest Browse all 17854

TIP: 7-zip's XZ compression on a multiprocessor system is often faster and compresses better than gzip

Hey folks, thought I'd share this little bit of info I discovered. For the longest time I've been using pigz (a gzip program which takes advantage of multicore capability) to compress my database backups. Out of curiosity (and that my backups keep getting larger) I wanted to see if using the xz format was possible. The xz program provided by xz-utils on my system isn't multithreaded (from the man page):

-T threads, --threads=threads Specify the number of worker threads to use. The actual number of threads can be less than threads if using more threads would exceed the memory usage limit. Multithreaded compression and decompression are not implemented yet, so this option has no effect for now. As of writing (2010-09-27), it hasn't been decided if threads will be used by default on multicore systems once support for threading has been implemented. Comments are welcome. The com- plicating factor is that using many threads will increase the memory usage dramatically. Note that if multithreading will be the default, it will probably be done so that single-threaded and multithreaded modes produce the same output, so compression ratio won't be significantly affected if threading will be enabled by default. 

This stinks because the server running the backups has 8 physical cores and no shortage of memory. I found out that 7z and 7za from p7zip create xz archives using multiple threads by default. I did some experiments on a dual-core system (Intel Core2 Duo E6420) with various compression programs and levels using a 2908748kb (2.7GB) database dump from my moodle server. The 7z command-line options are a little weird, but workable. Let me break down one of the commands I used: 7za a dummy -txz -mx3 -si -so <db.sql >7z3_db.sql.xz

Command SegmentExplanation
7za7za is the 7-zip binary. The difference between 7z and 7za is 7za doesn't load any plugins (only uses native algorithms). For our purposes there really is no difference.
aAdd files to archive
dummyNecessary when compressing to stdout with the -so option.
-txzSets the type of archive to xz.
-mx3Sets the compression level to 3 (valid range is 0-9).
-siCompress data from stdin
-soCompress data to stdout
<db.sqlRead db.sql into stdin (I know 7z can read files directly, but when I'm dumping the SQL it's going into stdin and I wanted my test case to be from stdin too).
>7z3_db.sql.xzWrite stdout to 7z3_db.sql.xz. In my backup script, this will be piped over SSH.

Below are the results.

CommandRun Time (s)Size (kb)Size (mb)Ratio
pigz -1 <db.sql >pigz1_db.sql.gz51.83543796531.0518.70%
lzop <db.sql >lzop_db.sql.lzop53.82755204737.5025.96%
gzip -1 <db.sql >gzip1_db.sql.gz63.75545084532.3118.74%
pigz <db.sql >pigz_db.sql.gz81.89417484407.7014.35%
7za a dummy -txz -mx1 -si -so <db.sql >7z1_db.sql.xz117.08335172327.3211.52%
7za a dummy -txz -mx2 -si -so <db.sql >7z2_db.sql.xz124.18326076318.4311.21%
7za a dummy -txz -mx3 -si -so <db.sql >7z3_db.sql.xz150.44320780313.2611.03%
7za a dummy -txz -mx4 -si -so <db.sql >7z4_db.sql.xz237.03317316309.8810.91%
gzip <db.sql >gzip_db.sql.gz266.46418176408.3814.38%
pbzip2 -1 <db.sql >pbzip21_db.sql.bz2286.09342164334.1411.76%
pigz --best <db.sql >pigzbest_db.sql.gz293.68407296397.7514.00%
pbzip2 -9 <db.sql >pbzip29_db.sql.bz2391.75302644295.5510.40%
bzip2 -1 <db.sql >bzip1_db.sql.bz2492.68341880333.8711.75%
gzip -9 <db.sql >gzip9_db.sql.gz600.57408304398.7314.04%
bzip2 -9 <db.sql >bzip9_db.sql.bz2632.29302208295.1310.39%
xz -3 <db.sql >xz_db.sql.xz645.39302028294.9510.38%

We can see the runtimes of the single-threaded programs (with the exception of lzop) is by far the worst even on this dual-core system. lzop is almost twice as fast as the default pigz compression (-7) but results in a file almost twice as large. For 7za, the -mx1 compression level seems to be a fair candidate for replacement of pigz. It's only a little slower and in my case shaved off almost 3% of the file size. In my environment, I went with -mx3, because it saves almost 100mb per backup and delivers that in a reasonable time frame (for me).

TL;DR

Using 7za to create xz archives is a lot faster than xz and gzip (but not faster than pigz) on multicore systems and results in a much better compression ratio. Hope this helps someone. 😊

Update 2013-12-11T19:47+0000

  • Added results for pbzip2, bzip2, gzip, and pigz.
submitted by TyIzaeL
[link][14 comments]

Viewing all articles
Browse latest Browse all 17854