gzip (GNU-zip, files with extension “.gz”) compression is one of the most widely used in Linux environments, because it offers a good tradeoff between the compression ratio obtained and the processing time required to perform the compression.

This post presents code samples written in Java to generate a compressed “.gz” file, and to read a file compressed using this format.

Writing a gzip file

The functionality required to perform the compression in gzip format is implemented in the java base library in the “GZIPOutputStream” class.

In our code samples, we will read a text file “file.txt”, opening a “standard” input stream by means of the FileInputStream class:

next, a  “GZIPOutputStream” output stream is opened to write the compressed file:

then, the input file is read in a loop with calls to the “read” method” of the input stream, and the compressed file is written with calls to the “write” method of the output stream. A buffer of 1024 bytes is used for reading and writing:

The complete code sample is:

Reading a gzip file

Reading from a file compressed with the gzip format is done using a stream of class GZIPInputStream. Other than that, reading the file is similar to reading from an uncompressed file.

For instance, to read a “file.gz” compressed file and print its content to screen, the following sample code can be used:

If the compressed file is a text file, and we want to read it line by line, we can just create a BufferedReader instance from the GZIPInputStream instance:

Differences between the gzip format and the zip format.

gzip compresses a single file.

On the other hand, the zip format, that is also widely used, mainly in Windows environments, can hold several files in a directory tree. When a zip file is expanded, the directory tree is recreated.

Usually, the same “archive” functionality is achieved in Linux by generating previously a “tar” archive, and then compressing it using gzip. Typically, the compressed tar file is given a name with extension “.tgz” (gzipped-tar)

