Next: , Up: Compression


8.1.1 Creating and Reading Compressed Archives

GNU tar is able to create and read compressed archives. It supports gzip and bzip2 compression programs. For backward compatibility, it also supports compress command, although we strongly recommend against using it, since there is a patent covering the algorithm it uses and you could be sued for patent infringement merely by running compress! Besides, it is less effective than gzip and bzip2.

Creating a compressed archive is simple: you just specify a compression option along with the usual archive creation commands. The compression option is -z (--gzip) to create a gzip compressed archive, -j (--bzip2) to create a bzip2 compressed archive, and -Z (--compress) to use compress program. For example:

     $ tar cfz archive.tar.gz .

Reading compressed archive is even simpler: you don't need to specify any additional options as GNU tar recognizes its format automatically. Thus, the following commands will list and extract the archive created in previous example:

     # List the compressed archive
     $ tar tf archive.tar.gz
     # Extract the compressed archive
     $ tar xf archive.tar.gz

The only case when you have to specify a decompression option while reading the archive is when reading from a pipe or from a tape drive that does not support random access. However, in this case GNU tar will indicate which option you should use. For example:

     $ cat archive.tar.gz | tar tf -
     tar: Archive is compressed.  Use -z option
     tar: Error is not recoverable: exiting now

If you see such diagnostics, just add the suggested option to the invocation of GNU tar:

     $ cat archive.tar.gz | tar tfz -

Notice also, that there are several restrictions on operations on compressed archives. First of all, compressed archives cannot be modified, i.e., you cannot update (--update (-u)) them or delete (--delete) members from them. Likewise, you cannot append another tar archive to a compressed archive using --append (-r)). Secondly, multi-volume archives cannot be compressed.

The following table summarizes compression options used by GNU tar.

-z
--gzip
--ungzip
Filter the archive through gzip.

You can use --gzip and --gunzip on physical devices (tape drives, etc.) and remote files as well as on normal files; data to or from such devices or remote files is reblocked by another copy of the tar program to enforce the specified (or default) record size. The default compression parameters are used; if you need to override them, set GZIP environment variable, e.g.:

          $ GZIP=--best tar cfz archive.tar.gz subdir
     

Another way would be to avoid the --gzip (--gunzip, --ungzip, -z) option and run gzip explicitly:

          $ tar cf - subdir | gzip --best -c - > archive.tar.gz
     

About corrupted compressed archives: gzip'ed files have no redundancy, for maximum compression. The adaptive nature of the compression scheme means that the compression tables are implicitly spread all over the archive. If you lose a few blocks, the dynamic construction of the compression tables becomes unsynchronized, and there is little chance that you could recover later in the archive.

There are pending suggestions for having a per-volume or per-file compression in GNU tar. This would allow for viewing the contents without decompression, and for resynchronizing decompression at every volume or file, in case of corrupted archives. Doing so, we might lose some compressibility. But this would have make recovering easier. So, there are pros and cons. We'll see!


-j
--bzip2
Filter the archive through bzip2. Otherwise like --gzip.


-Z
--compress
--uncompress
Filter the archive through compress. Otherwise like --gzip.

The GNU Project recommends you not use compress, because there is a patent covering the algorithm it uses. You could be sued for patent infringement merely by running compress.


--use-compress-program=prog
Use external compression program prog. Use this option if you have a compression program that GNU tar does not support. There are two requirements to which prog should comply:

First, when called without options, it should read data from standard input, compress it and output it on standard output.

Secondly, if called with -d argument, it should do exactly the opposite, i.e., read the compressed data from the standard input and produce uncompressed data on the standard output.

The --use-compress-program option, in particular, lets you implement your own filters, not necessarily dealing with compression/decompression. For example, suppose you wish to implement PGP encryption on top of compression, using gpg (see gpg). The following script does that:

     #! /bin/sh
     case $1 in
     -d) gpg --decrypt - | gzip -d -c;;
     '') gzip -c | gpg -s ;;
     *)  echo "Unknown option $1">&2; exit 1;;
     esac

Suppose you name it gpgz and save it somewhere in your PATH. Then the following command will create a compressed archive signed with your private key:

     $ tar -cf foo.tar.gpgz --use-compress=gpgz .

Likewise, the following command will list its contents:

     $ tar -tf foo.tar.gpgz --use-compress=gpgz .