In this tutorial, we will use the Apache Commons Compress library for handling TAR and GZ compression or tar.gz file.
Table of Contents:
- 1. What is tar.gz?
- 2. Adding Dependencies
- 3. Creating a tar.gz File in Java
- 4. Extracting a tar.gz File in Java
- 5. Adding Files or Directories to an Existing tar.gz File
- 6. References
1. What is tar.gz?
- The tar format is used to combine multiple files into a single archive, commonly referred to as a tarball, and typically has the
.tarextension. - Gzip is a compression tool that reduces file size and generally uses the
.gzextension. - A
.tar.gzor.tgzfile is created by first grouping multiple files into a tar archive and then compressing it using Gzip.
2. Adding Dependencies
To work with .tar.gz files, we need the commons-compress in our project.
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-compress</artifactId>
<version>1.27.0</version>
</dependency>
3. Creating a tar.gz File in Java
The following Java code demonstrates how to create a .tar.gz file and add files or directories to it.
package com.mkyong.zip;
import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
import java.io.*;
import java.nio.file.*;
public class TarGzCreator {
public static void createTarGz(Path source, Path output) throws IOException {
try (OutputStream fos = Files.newOutputStream(output);
BufferedOutputStream bos = new BufferedOutputStream(fos);
GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bos);
TarArchiveOutputStream taos = new TarArchiveOutputStream(gzos)) {
addFilesToTarGz(source, "", taos);
}
}
private static void addFilesToTarGz(Path path,
String parent,
TarArchiveOutputStream taos)
throws IOException {
String entryName = parent + path.getFileName().toString();
TarArchiveEntry entry = new TarArchiveEntry(path.toFile(), entryName);
taos.putArchiveEntry(entry);
if (Files.isRegularFile(path)) {
// add file
try (InputStream fis = Files.newInputStream(path)) {
byte[] buffer = new byte[1024];
int len;
while ((len = fis.read(buffer)) > 0) {
taos.write(buffer, 0, len);
}
}
taos.closeArchiveEntry();
} else {
taos.closeArchiveEntry();
// walk directory
try (DirectoryStream<Path> stream = Files.newDirectoryStream(path)) {
for (Path child : stream) {
addFilesToTarGz(child, entryName + "/", taos);
}
}
}
}
public static void main(String[] args) throws IOException {
Path source = Paths.get("/path/to/source");
Path output = Paths.get("/path/to/output.tar.gz");
createTarGz(source, output);
}
}
4. Extracting a tar.gz File in Java
To extract a .tar.gz archive, we first decompress the GZ file and then extract the TAR archive, also ensuring prevent the Zip Slip vulnerabilities.
package com.mkyong.zip;
import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveInputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
import java.io.*;
import java.nio.file.*;
public class TarGzExtractor {
public static void extractTarGz(Path inputFile, Path outputDir) throws IOException {
try (InputStream fis = Files.newInputStream(inputFile);
BufferedInputStream bis = new BufferedInputStream(fis);
GzipCompressorInputStream gzis = new GzipCompressorInputStream(bis);
TarArchiveInputStream tais = new TarArchiveInputStream(gzis)) {
TarArchiveEntry entry;
while ((entry = tais.getNextTarEntry()) != null) {
// Prevent Zip Slip vulnerability
Path outputPath = outputDir.resolve(entry.getName()).normalize();
if (!outputPath.startsWith(outputDir)) {
throw new IOException("Entry is outside of the target directory: "
+ entry.getName());
}
if (entry.isDirectory()) {
Files.createDirectories(outputPath);
} else {
Files.createDirectories(outputPath.getParent());
try (OutputStream fos = Files.newOutputStream(outputPath);
BufferedOutputStream bos = new BufferedOutputStream(fos)) {
byte[] buffer = new byte[1024];
int len;
while ((len = tais.read(buffer)) > 0) {
bos.write(buffer, 0, len);
}
}
}
}
}
}
public static void main(String[] args) throws IOException {
Path source = Paths.get("/path/to/existing.tar.gz");
Path outputDir = Paths.get("/path/to/outputDir");
extractTarGz(source, outputDir);
}
}
5. Adding Files or Directories to an Existing tar.gz File
Appending files to an existing .tar.gz file is complex because TAR does not support appending directly. The workaround is to extract the archive, add new files, and then re-create the .tar.gz file.
package com.mkyong.zip;
import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
import java.io.BufferedOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;
import java.util.Comparator;
import java.util.List;
public class TarGzUpdater {
public static void updateTarGz(Path tarGzPath,
List<Path> newFiles,
Path tempExtractDir,
Path updatedTarGz) throws IOException {
// Step 1: Extract the existing tar.gz file to a tempExtractDir
TarGzExtractor.extractTarGz(tarGzPath, tempExtractDir);
// Step 2: Copy new files to the tempExtractDir
for (Path file : newFiles) {
Path targetPath = tempExtractDir.resolve(file.getFileName());
Files.copy(file, targetPath, StandardCopyOption.REPLACE_EXISTING);
}
// Step 3: Re-create the tar.gz file with the updated contents,
// avoiding including tempDir itself
try (OutputStream fos = Files.newOutputStream(updatedTarGz);
BufferedOutputStream bos = new BufferedOutputStream(fos);
GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bos);
TarArchiveOutputStream taos = new TarArchiveOutputStream(gzos)) {
Files.walk(tempExtractDir)
// Avoid adding tempDir itself
.filter(path -> !path.equals(tempExtractDir))
.forEach(path -> {
try {
String entryName = tempExtractDir.relativize(path).toString();
TarArchiveEntry entry = new TarArchiveEntry(path.toFile(), entryName);
taos.putArchiveEntry(entry);
if (Files.isRegularFile(path)) {
Files.copy(path, taos);
}
taos.closeArchiveEntry();
} catch (IOException e) {
e.printStackTrace();
}
});
}
// Step 4: Delete the temporary extraction directory after use
Files.walk(tempExtractDir)
// Sort in reverse order to delete directories last
.sorted(Comparator.reverseOrder())
.forEach(path -> {
try {
Files.delete(path);
} catch (IOException e) {
e.printStackTrace();
}
});
}
public static void main(String[] args) throws IOException {
Path existingTarGz = Paths.get("/path/to/existing.tar.gz");
List<Path> newFiles = List.of(
Paths.get("/path/to/newFile1.txt"),
Paths.get("/path/to/newFile2.txt")
);
Path tempDir = Paths.get("/path/to/tempExtract");
Path updatedTarGz = Paths.get("/path/to/updated.tar.gz");
updateTarGz(existingTarGz, newFiles, tempDir, updatedTarGz);
}
}
Explanation:
- Extract the existing
.tar.gzfile to a temporary directory. - Copy new files into the extracted directory.
- Re-create the
.tar.gzfile with the updated contents while ensuring the tempDir itself is not included in the archive. - Delete the temporary directory after use.
Hi and thank you for this solution!
Just one note:
I used your example for “1. Add two files to tar.gz”
Even though it works and produces the intended file, when I tried to open the produced file, I kept getting the following error:
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
What is needed is this:
after: tOut.finish();
this line must be added: tOut.close();
(TarArchiveOutputStream must be closed)
I don’t know if anybody else had this problem, but I’m mentioning it, just in case someone did..
Thanks again!!!
Hi Team
We have an issue where sometime our tar file created from spring boot code is showing issue when we untar and an additional PAX file is getting added and hence that file is getting rejected by the Agencies. How can we create a tar file without this PAX file getting inside it automatically.
How would one add a file to an existing tar.gz?