Java IO Tutorial

How to create tar.gz in Java

In Java, we have ZipOutputStream to create a zip file, and GZIPOutputStream to compress a file using Gzip, but there is no official API to create a tar.gz file.

In Java, we can use Apache Commons Compress (Still active in development) to create a .tar.gz file.

pom.xml

  <dependency>
      <groupId>org.apache.commons</groupId>
      <artifactId>commons-compress</artifactId>
      <version>1.20</version>
  </dependency>

Notes

  1. The tar is for collecting files into one archive file, aka tarball, and generally has the suffix .tar
  2. The Gzip is for compress files to save space and generally has the suffix .gz
  3. The tar.gz or .tgz means group all files into one archive file, and compress it using Gzip.

The below code snippets will create a tar.gz file.


import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;

//...
try (OutputStream fOut = Files.newOutputStream(Paths.get("output.tar.gz"));
     BufferedOutputStream buffOut = new BufferedOutputStream(fOut);
     GzipCompressorOutputStream gzOut = new GzipCompressorOutputStream(buffOut);
     TarArchiveOutputStream tOut = new TarArchiveOutputStream(gzOut)) {

       TarArchiveEntry tarEntry = new TarArchiveEntry(file,fileName);

       tOut.putArchiveEntry(tarEntry);

       // copy file to TarArchiveOutputStream
       Files.copy(path, tOut);

       tOut.closeArchiveEntry();

       tOut.finish();

     }

The below code snippets will decompress a .tar.gz file.


import org.apache.commons.compress.archivers.ArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveInputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;

//...
try (InputStream fi = Files.newInputStream(Paths.get("input.tar.gz"));
     BufferedInputStream bi = new BufferedInputStream(fi);
     GzipCompressorInputStream gzi = new GzipCompressorInputStream(bi);
     TarArchiveInputStream ti = new TarArchiveInputStream(gzi)) {

    ArchiveEntry entry;
    while ((entry = ti.getNextEntry()) != null) {

        // create a new path, remember check zip slip attack
        Path newPath = filename(entry, targetDir);

        //checking

        // copy TarArchiveInputStream to newPath
        Files.copy(ti, newPath);

    }
}

1. Add two files to tar.gz

This example shows how to add two files into a tar.gz file.

TarGzipExample1.java

package com.mkyong.io.howto.compress;

import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;

import java.io.BufferedOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.List;

public class TarGzipExample1 {

    public static void main(String[] args) {

        try {

            Path path1 = Paths.get("/home/mkyong/test/sitemap.xml");
            Path path2 = Paths.get("/home/mkyong/test/file.txt");
            Path output = Paths.get("/home/mkyong/test/output.tar.gz");

            List<Path> paths = Arrays.asList(path1, path2);
            createTarGzipFiles(paths, output);

        } catch (IOException e) {
            e.printStackTrace();
        }

        System.out.println("Done");

    }

    // tar.gz few files
    public static void createTarGzipFiles(List<Path> paths, Path output)
        throws IOException {

        try (OutputStream fOut = Files.newOutputStream(output);
             BufferedOutputStream buffOut = new BufferedOutputStream(fOut);
             GzipCompressorOutputStream gzOut = new GzipCompressorOutputStream(buffOut);
             TarArchiveOutputStream tOut = new TarArchiveOutputStream(gzOut)) {

            for (Path path : paths) {

                if (!Files.isRegularFile(path)) {
                    throw new IOException("Support only file!");
                }

                TarArchiveEntry tarEntry = new TarArchiveEntry(
                                                  path.toFile(),
                                                  path.getFileName().toString());

                tOut.putArchiveEntry(tarEntry);

                // copy file to TarArchiveOutputStream
                Files.copy(path, tOut);

                tOut.closeArchiveEntry();

            }

            tOut.finish();

        }

    }

}

Output – It adds sitemap.xml and file.txt into one archive file output.tar and compress it using Gzip, and the result is a output.tar.gz

Terminal

$ tar -tvf /home/mkyong/test/output.tar.gz
-rw-r--r-- 0/0          396719 2020-08-12 14:02 sitemap.xml
-rw-r--r-- 0/0              15 2020-08-11 17:56 file.txt

2. Add a directory to tar.gz

This example adds a directory, including its sub-files and sub-directories into one archive file and Gzip compress it into a .tar.gz

The idea is to use Files.walkFileTree to walk a file tree and add the file one by one into the TarArchiveOutputStream.

TarGzipExample2.java

package com.mkyong.io.howto.compress;

import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;

import java.io.BufferedOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.*;
import java.nio.file.attribute.BasicFileAttributes;

public class TarGzipExample2 {

    public static void main(String[] args) {

        try {

            // tar.gz a folder
            Path source = Paths.get("/home/mkyong/test");
            createTarGzipFolder(source);

        } catch (IOException e) {
            e.printStackTrace();
        }

        System.out.println("Done");

    }

    // generate .tar.gz file at the current working directory
    // tar.gz a folder
    public static void createTarGzipFolder(Path source) throws IOException {

        if (!Files.isDirectory(source)) {
            throw new IOException("Please provide a directory.");
        }

        // get folder name as zip file name
        String tarFileName = source.getFileName().toString() + ".tar.gz";

        try (OutputStream fOut = Files.newOutputStream(Paths.get(tarFileName));
             BufferedOutputStream buffOut = new BufferedOutputStream(fOut);
             GzipCompressorOutputStream gzOut = new GzipCompressorOutputStream(buffOut);
             TarArchiveOutputStream tOut = new TarArchiveOutputStream(gzOut)) {

            Files.walkFileTree(source, new SimpleFileVisitor<>() {

                @Override
                public FileVisitResult visitFile(Path file,
                                            BasicFileAttributes attributes) {

                    // only copy files, no symbolic links
                    if (attributes.isSymbolicLink()) {
                        return FileVisitResult.CONTINUE;
                    }

                    // get filename
                    Path targetFile = source.relativize(file);

                    try {
                        TarArchiveEntry tarEntry = new TarArchiveEntry(
                                file.toFile(), targetFile.toString());

                        tOut.putArchiveEntry(tarEntry);

                        Files.copy(file, tOut);

                        tOut.closeArchiveEntry();

                        System.out.printf("file : %s%n", file);

                    } catch (IOException e) {
                        System.err.printf("Unable to tar.gz : %s%n%s%n", file, e);
                    }

                    return FileVisitResult.CONTINUE;
                }

                @Override
                public FileVisitResult visitFileFailed(Path file, IOException exc) {
                    System.err.printf("Unable to tar.gz : %s%n%s%n", file, exc);
                    return FileVisitResult.CONTINUE;
                }

            });

            tOut.finish();
        }

    }

}

3. Add String to tar.gz

This example adds String into a ByteArrayInputStream and put it into the TarArchiveOutputStream directly. It means to create a file without saving it into the local disk and put the file into the tar.gz directly.


    public static void createTarGzipFilesOnDemand() throws IOException {

        String data1 = "Test data 1";
        String fileName1 = "111.txt";

        String data2 = "Test data 2 3 4";
        String fileName2 = "folder/222.txt";

        String outputTarGzip = "/home/mkyong/output.tar.gz";

        try (OutputStream fOut = Files.newOutputStream(Paths.get(outputTarGzip));
             BufferedOutputStream buffOut = new BufferedOutputStream(fOut);
             GzipCompressorOutputStream gzOut = new GzipCompressorOutputStream(buffOut);
             TarArchiveOutputStream tOut = new TarArchiveOutputStream(gzOut)) {

            createTarArchiveEntry(fileName1, data1, tOut);
            createTarArchiveEntry(fileName2, data2, tOut);

            tOut.finish();
        }

    }

    private static void createTarArchiveEntry(String fileName,
                                              String data,
                                              TarArchiveOutputStream tOut)
                                              throws IOException {

        byte[] dataInBytes = data.getBytes();

        // create a byte[] input stream
        ByteArrayInputStream baOut1 = new ByteArrayInputStream(dataInBytes);

        TarArchiveEntry tarEntry = new TarArchiveEntry(fileName);

        // need defined the file size, else error
        tarEntry.setSize(dataInBytes.length);
        // tarEntry.setSize(baOut1.available()); alternative

        tOut.putArchiveEntry(tarEntry);

        // copy ByteArrayInputStream to TarArchiveOutputStream
        byte[] buffer = new byte[1024];
        int len;
        while ((len = baOut1.read(buffer)) > 0) {
            tOut.write(buffer, 0, len);
        }

        tOut.closeArchiveEntry();

    }

4. Decompress file – tar.gz

This example shows how to decompress and extract a tar.gz file, and it also checks the zip slip vulnerability.

TarGzipExample4.java

package com.mkyong.io.howto.compress;

import org.apache.commons.compress.archivers.ArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveInputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;

import java.io.BufferedInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;

public class TarGzipExample4 {

    public static void main(String[] args) {

        try {

            // decompress .tar.gz
            Path source = Paths.get("/home/mkyong/test/output.tar.gz");
            Path target = Paths.get("/home/mkyong/test2");
            decompressTarGzipFile(source, target);

        } catch (IOException e) {
            e.printStackTrace();
        }

        System.out.println("Done");

    }

    public static void decompressTarGzipFile(Path source, Path target)
        throws IOException {

        if (Files.notExists(source)) {
            throw new IOException("File doesn't exists!");
        }

        try (InputStream fi = Files.newInputStream(source);
             BufferedInputStream bi = new BufferedInputStream(fi);
             GzipCompressorInputStream gzi = new GzipCompressorInputStream(bi);
             TarArchiveInputStream ti = new TarArchiveInputStream(gzi)) {

            ArchiveEntry entry;
            while ((entry = ti.getNextEntry()) != null) {

                // create a new path, zip slip validate
                Path newPath = zipSlipProtect(entry, target);

                if (entry.isDirectory()) {
                    Files.createDirectories(newPath);
                } else {

                    // check parent folder again
                    Path parent = newPath.getParent();
                    if (parent != null) {
                        if (Files.notExists(parent)) {
                            Files.createDirectories(parent);
                        }
                    }

                    // copy TarArchiveInputStream to Path newPath
                    Files.copy(ti, newPath, StandardCopyOption.REPLACE_EXISTING);

                }
            }
        }
    }

    private static Path zipSlipProtect(ArchiveEntry entry, Path targetDir)
        throws IOException {

        Path targetDirResolved = targetDir.resolve(entry.getName());

        // make sure normalized file still has targetDir as its prefix,
        // else throws exception
        Path normalizePath = targetDirResolved.normalize();

        if (!normalizePath.startsWith(targetDir)) {
            throw new IOException("Bad entry: " + entry.getName());
        }

        return normalizePath;
    }

}

Further Reading
Please check the official Apache Commons Compress examples.

Download Source Code

$ git clone https://github.com/mkyong/core-java

$ cd java-io

References

About Author

author image
Founder of Mkyong.com, love Java and open source stuff. Follow him on Twitter. If you like my tutorials, consider make a donation to these charities.

Comments

Subscribe
Notify of
3 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
Kelly
3 years ago

Hi and thank you for this solution!

Just one note:
I used your example for “1. Add two files to tar.gz”
Even though it works and produces the intended file, when I tried to open the produced file, I kept getting the following error:

tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

What is needed is this:

after: tOut.finish();

this line must be added: tOut.close();

(TarArchiveOutputStream must be closed)

I don’t know if anybody else had this problem, but I’m mentioning it, just in case someone did..

Thanks again!!!

shashank
1 year ago

Hi Team

We have an issue where sometime our tar file created from spring boot code is showing issue when we untar and an additional PAX file is getting added and hence that file is getting rejected by the Agencies. How can we create a tar file without this PAX file getting inside it automatically.

Anthony
3 years ago

How would one add a file to an existing tar.gz?