How to read a UTF-8 file in Java

In Java, the InputStreamReader accepts a charset to decode the byte streams into character streams. We can pass a StandardCharsets.UTF_8 into the InputStreamReader constructor to read data from a UTF-8 file.


import java.nio.charset.StandardCharsets;

  //...
  try (FileInputStream fis = new FileInputStream(file);
       InputStreamReader isr = new InputStreamReader(fis, StandardCharsets.UTF_8);
       BufferedReader reader = new BufferedReader(isr)
  ) {

      String str;
      while ((str = reader.readLine()) != null) {
          System.out.println(str);
      }

  } catch (IOException e) {
      e.printStackTrace();
  }

In Java 7+, many file read APIs start to accept charset as an argument, making reading a UTF-8 very easy.


  // Java 7
  BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8);

  // Java 8
  List<String> list = Files.readAllLines(path, StandardCharsets.UTF_8);

  // Java 8
  Stream<String> lines = Files.lines(path, StandardCharsets.UTF_8);

  // Java 11
  String s = Files.readString(path, StandardCharsets.UTF_8);

1. UTF-8 File

A UTF-8 encoded file c:\\temp\\test.txt, with Chinese characters.

2. Read UTF-8 file

This example shows a few ways to read a UTF-8 file.


package com.mkyong.io.howto;

import java.io.*;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.stream.Stream;

public class UnicodeRead {

    public static void main(String[] args) {

        String fileName = "c:\\temp\\test.txt";

        //readUnicodeJava11(fileName);
        readUnicodeBufferedReader(fileName);
        //readUnicodeFiles(fileName);
        //readUnicodeClassic(fileName);

    }

    // Java 7 - Files.newBufferedReader(path, StandardCharsets.UTF_8)
    // Java 8 - Files.newBufferedReader(path) // default UTF-8
    public static void readUnicodeBufferedReader(String fileName) {

        Path path = Paths.get(fileName);

        // Java 8, default UTF-8
        try (BufferedReader reader = Files.newBufferedReader(path)) {

            String str;
            while ((str = reader.readLine()) != null) {
                System.out.println(str);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    public static void readUnicodeFiles(String fileName) {

        Path path = Paths.get(fileName);
        try {

            // Java 11
            String s = Files.readString(path, StandardCharsets.UTF_8);
            System.out.println(s);

            // Java 8
            List<String> list = Files.readAllLines(path, StandardCharsets.UTF_8);
            list.forEach(System.out::println);

            // Java 8
            Stream<String> lines = Files.lines(path, StandardCharsets.UTF_8);
            lines.forEach(System.out::println);

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    // Java 11, adds charset to FileReader
    public static void readUnicodeJava11(String fileName) {

        Path path = Paths.get(fileName);

        try (FileReader fr = new FileReader(fileName, StandardCharsets.UTF_8);
             BufferedReader reader = new BufferedReader(fr)) {

            String str;
            while ((str = reader.readLine()) != null) {
                System.out.println(str);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    public static void readUnicodeClassic(String fileName) {

        File file = new File(fileName);

        try (FileInputStream fis = new FileInputStream(file);
             InputStreamReader isr = new InputStreamReader(fis, StandardCharsets.UTF_8);
             BufferedReader reader = new BufferedReader(isr)
        ) {

            String str;
            while ((str = reader.readLine()) != null) {
                System.out.println(str);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }

    }
}

Output

Terminal


line 1
line 2
line 3
你好，世界

Further Reading
How to write to a UTF-8 file in Java

Download Source Code

$ git clone https://github.com/mkyong/core-java

$ cd java-io

References

16 Comments

Most Voted

Newest Oldest

Inline Feedbacks

View all comments

Dipak

9 years ago

Thanks Man.. this fixed my issue. I am grateful.

zik

12 years ago

According to http://docs.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html

maybe you should specify the encoding name to be “UTF-8” rather than “UTF8”?

mkyong

5 years ago

Reply to zik

Thanks, we update the article with StandardCharsets.UTF_8

jessie

10 months ago

Thanks a lot, you really saved me a load of time.
I had a problem with a simple config text file which didn’t have any BOM. It contained ASCII text and a 16 character password with non-ASCII characters (>127), a string like “Ahsy5&ç@2AjS3”.
This file is read perfectly under Windows (and the code works) but not under Linux where the length of the string resulted longer by 1 character.
So I thought I cound use .readUTF() instead of .readLine() but it didn’t work [ and WTH is “modified UTF”? That’s what it said in the source code for .readUTF() ]
Your code worked first go. Keep it up man!

Fabricio

7 years ago

thank you

Julio

8 years ago

thanks!!, this tutorial helped with my problem.

Jagat

10 years ago

Hey,
I am getting only sybmols from the extracted gzip file.
Can anyone helps me?

Anand Kadam

Reply to Jagat

go to the project properties and set “Text encoding file” as UTF8.

sameer j

“copywrite symbol” get converted to “question mark inside blackdiamond “

Omer

11 years ago

You are amazing thank you mr mkyong

Raghavendra

13 years ago

Hi Mkyong, How to get the encoding characterset of a file in java? Please provide the source code for this. And is UTF-8-> ANSI ?

varun bhatia

Does not work for me! I am getting a ? prefixed to the first line.

Launcher Go

14 years ago

I just want to ask why you put a lot of catch statements wherein you already put a generalized catch statement at the bottom?

sorry, I am still a newbie~

Anthony

Reply to Launcher Go

to catch any other exception that could be trowed without being catched by the other catch statements

sridhar

15 years ago

this is a real stupidity that i can’t post UTF-8 string to explain the problem in code

reddy

i dint get utf abbrivation

-1

1. UTF-8 File

2. Read UTF-8 file

Download Source Code

References

mkyong

Related Posts