Java – How to add and remove BOM from UTF-8 file

This article shows you how to add, check and remove the byte order mark (BOM) from a UTF-8 file. The UTF-8 representation of the BOM is the byte sequence 0xEF, 0xBB, 0xBF (hexadecimal), at the beginning of the file. 1. Add BOM to a UTF-8 file 2. Check if a file contains UTF-8 BOM 3. …

Read more

How to read a UTF-8 file in Java

In Java, the InputStreamReader accepts a charset to decode the byte streams into character streams. We can pass a StandardCharsets.UTF_8 into the InputStreamReader constructor to read data from a UTF-8 file. import java.nio.charset.StandardCharsets; //… try (FileInputStream fis = new FileInputStream(file); InputStreamReader isr = new InputStreamReader(fis, StandardCharsets.UTF_8); BufferedReader reader = new BufferedReader(isr) ) { String str; …

Read more

How to write a UTF-8 file in Java

In Java, the OutputStreamWriter accepts a charset to encode the character streams into byte streams. We can pass a StandardCharsets.UTF_8 into the OutputStreamWriter constructor to write data to a UTF-8 file. try (FileOutputStream fos = new FileOutputStream(file); OutputStreamWriter osw = new OutputStreamWriter(fos, StandardCharsets.UTF_8); BufferedWriter writer = new BufferedWriter(osw)) { writer.append(line); } In Java 7+, many …

Read more

How to convert String to byte[] in Java?

In Java, we can use str.getBytes(StandardCharsets.UTF_8) to convert a String into a byte[]. String str = "This is a String"; // default charset, a bit dangerous byte[] output1 = str.getBytes(); // in old days, before java 1.7 byte[] output2 = str.getBytes(Charset.forName("UTF-8")); // the best , java 1.7+ , new class StandardCharsets byte[] output3 = str.getBytes(StandardCharsets.UTF_8); …

Read more