How to create tar.gz file in Java

In this tutorial, we will use the Apache Commons Compress library for handling TAR and GZ compression or tar.gz file.

Table of Contents:

1. What is tar.gz?

  • The tar format is used to combine multiple files into a single archive, commonly referred to as a tarball, and typically has the .tar extension.
  • Gzip is a compression tool that reduces file size and generally uses the .gz extension.
  • A .tar.gz or .tgz file is created by first grouping multiple files into a tar archive and then compressing it using Gzip.

2. Adding Dependencies

To work with .tar.gz files, we need the commons-compress in our project.

pom.xml

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-compress</artifactId>
    <version>1.27.0</version>
</dependency>

3. Creating a tar.gz File in Java

The following Java code demonstrates how to create a .tar.gz file and add files or directories to it.

TarGzCreator.java

package com.mkyong.zip;

import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;

import java.io.*;
import java.nio.file.*;

public class TarGzCreator {

    public static void createTarGz(Path source, Path output) throws IOException {
        try (OutputStream fos = Files.newOutputStream(output);
             BufferedOutputStream bos = new BufferedOutputStream(fos);
             GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bos);
             TarArchiveOutputStream taos = new TarArchiveOutputStream(gzos)) {
            addFilesToTarGz(source, "", taos);
        }
    }

    private static void addFilesToTarGz(Path path, 
                                        String parent, 
                                        TarArchiveOutputStream taos)
            throws IOException {
        String entryName = parent + path.getFileName().toString();
        TarArchiveEntry entry = new TarArchiveEntry(path.toFile(), entryName);
        taos.putArchiveEntry(entry);

        if (Files.isRegularFile(path)) {
            // add file
            try (InputStream fis = Files.newInputStream(path)) {
                byte[] buffer = new byte[1024];
                int len;
                while ((len = fis.read(buffer)) > 0) {
                    taos.write(buffer, 0, len);
                }
            }
            taos.closeArchiveEntry();
        } else {
            taos.closeArchiveEntry();
            // walk directory
            try (DirectoryStream<Path> stream = Files.newDirectoryStream(path)) {
                for (Path child : stream) {
                    addFilesToTarGz(child, entryName + "/", taos);
                }
            }
        }
    }

    public static void main(String[] args) throws IOException {
        Path source = Paths.get("/path/to/source");
        Path output = Paths.get("/path/to/output.tar.gz");
        createTarGz(source, output);
    }
}

4. Extracting a tar.gz File in Java

To extract a .tar.gz archive, we first decompress the GZ file and then extract the TAR archive, also ensuring prevent the Zip Slip vulnerabilities.

TarGzExtractor.java

package com.mkyong.zip;
import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveInputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;

import java.io.*;
import java.nio.file.*;

public class TarGzExtractor {

    public static void extractTarGz(Path inputFile, Path outputDir) throws IOException {
        try (InputStream fis = Files.newInputStream(inputFile);
             BufferedInputStream bis = new BufferedInputStream(fis);
             GzipCompressorInputStream gzis = new GzipCompressorInputStream(bis);
             TarArchiveInputStream tais = new TarArchiveInputStream(gzis)) {

            TarArchiveEntry entry;
            while ((entry = tais.getNextTarEntry()) != null) {

                // Prevent Zip Slip vulnerability
                Path outputPath = outputDir.resolve(entry.getName()).normalize();
                if (!outputPath.startsWith(outputDir)) {
                    throw new IOException("Entry is outside of the target directory: " 
                            + entry.getName());
                }

                if (entry.isDirectory()) {
                    Files.createDirectories(outputPath);
                } else {
                    Files.createDirectories(outputPath.getParent());
                    try (OutputStream fos = Files.newOutputStream(outputPath);
                         BufferedOutputStream bos = new BufferedOutputStream(fos)) {
                        byte[] buffer = new byte[1024];
                        int len;
                        while ((len = tais.read(buffer)) > 0) {
                            bos.write(buffer, 0, len);
                        }
                    }
                }
            }
        }
    }

    public static void main(String[] args) throws IOException {

        Path source = Paths.get("/path/to/existing.tar.gz");
        Path outputDir = Paths.get("/path/to/outputDir");
        
        extractTarGz(source, outputDir);

    }
}

5. Adding Files or Directories to an Existing tar.gz File

Appending files to an existing .tar.gz file is complex because TAR does not support appending directly. The workaround is to extract the archive, add new files, and then re-create the .tar.gz file.

TarGzUpdater.java

package com.mkyong.zip;

import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;

import java.io.BufferedOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;
import java.util.Comparator;
import java.util.List;

public class TarGzUpdater {

    public static void updateTarGz(Path tarGzPath,
                                   List<Path> newFiles,
                                   Path tempExtractDir,
                                   Path updatedTarGz) throws IOException {

        // Step 1: Extract the existing tar.gz file to a tempExtractDir
        TarGzExtractor.extractTarGz(tarGzPath, tempExtractDir);

        // Step 2: Copy new files to the tempExtractDir
        for (Path file : newFiles) {
            Path targetPath = tempExtractDir.resolve(file.getFileName());
            Files.copy(file, targetPath, StandardCopyOption.REPLACE_EXISTING);
        }

        // Step 3: Re-create the tar.gz file with the updated contents, 
        // avoiding including tempDir itself
        try (OutputStream fos = Files.newOutputStream(updatedTarGz);
             BufferedOutputStream bos = new BufferedOutputStream(fos);
             GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bos);
             TarArchiveOutputStream taos = new TarArchiveOutputStream(gzos)) {

            Files.walk(tempExtractDir)
                    // Avoid adding tempDir itself
                    .filter(path -> !path.equals(tempExtractDir)) 
                    .forEach(path -> {
                        try {
                            String entryName = tempExtractDir.relativize(path).toString();
                            TarArchiveEntry entry = new TarArchiveEntry(path.toFile(), entryName);
                            taos.putArchiveEntry(entry);
                            if (Files.isRegularFile(path)) {
                                Files.copy(path, taos);
                            }
                            taos.closeArchiveEntry();
                        } catch (IOException e) {
                            e.printStackTrace();
                        }
                    });
        }

        // Step 4: Delete the temporary extraction directory after use
        Files.walk(tempExtractDir)
                // Sort in reverse order to delete directories last
                .sorted(Comparator.reverseOrder()) 
                .forEach(path -> {
                    try {
                        Files.delete(path);
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                });
    }

    public static void main(String[] args) throws IOException {

        Path existingTarGz = Paths.get("/path/to/existing.tar.gz");
        List<Path> newFiles = List.of(
                Paths.get("/path/to/newFile1.txt"),
                Paths.get("/path/to/newFile2.txt")
        );
        Path tempDir = Paths.get("/path/to/tempExtract");
        Path updatedTarGz = Paths.get("/path/to/updated.tar.gz");

        updateTarGz(existingTarGz, newFiles, tempDir, updatedTarGz);
    }
}

Explanation:

  1. Extract the existing .tar.gz file to a temporary directory.
  2. Copy new files into the extracted directory.
  3. Re-create the .tar.gz file with the updated contents while ensuring the tempDir itself is not included in the archive.
  4. Delete the temporary directory after use.

6. References

mkyong

Founder of Mkyong.com, passionate Java and open-source technologies. If you enjoy my tutorials, consider making a donation to these charities.

3 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
Kelly
5 years ago

Hi and thank you for this solution!

Just one note:
I used your example for “1. Add two files to tar.gz”
Even though it works and produces the intended file, when I tried to open the produced file, I kept getting the following error:

tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

What is needed is this:

after: tOut.finish();

this line must be added: tOut.close();

(TarArchiveOutputStream must be closed)

I don’t know if anybody else had this problem, but I’m mentioning it, just in case someone did..

Thanks again!!!

shashank
3 years ago

Hi Team

We have an issue where sometime our tar file created from spring boot code is showing issue when we untar and an additional PAX file is getting added and hence that file is getting rejected by the Agencies. How can we create a tar file without this PAX file getting inside it automatically.

Anthony
5 years ago

How would one add a file to an existing tar.gz?