How to read a UTF-8 file in Java

In Java, the InputStreamReader accepts a charset to decode the byte streams into character streams. We can pass a StandardCharsets.UTF_8 into the InputStreamReader constructor to read data from a UTF-8 file.


import java.nio.charset.StandardCharsets;

  //...
  try (FileInputStream fis = new FileInputStream(file);
       InputStreamReader isr = new InputStreamReader(fis, StandardCharsets.UTF_8);
       BufferedReader reader = new BufferedReader(isr)
  ) {

      String str;
      while ((str = reader.readLine()) != null) {
          System.out.println(str);
      }

  } catch (IOException e) {
      e.printStackTrace();
  }

In Java 7+, many file read APIs start to accept charset as an argument, making reading a UTF-8 very easy.


  // Java 7
  BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8);

  // Java 8
  List<String> list = Files.readAllLines(path, StandardCharsets.UTF_8);

  // Java 8
  Stream<String> lines = Files.lines(path, StandardCharsets.UTF_8);

  // Java 11
  String s = Files.readString(path, StandardCharsets.UTF_8);

1. UTF-8 File

A UTF-8 encoded file c:\\temp\\test.txt, with Chinese characters.

utf-8 file

2. Read UTF-8 file

This example shows a few ways to read a UTF-8 file.


package com.mkyong.io.howto;

import java.io.*;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.stream.Stream;

public class UnicodeRead {

    public static void main(String[] args) {

        String fileName = "c:\\temp\\test.txt";

        //readUnicodeJava11(fileName);
        readUnicodeBufferedReader(fileName);
        //readUnicodeFiles(fileName);
        //readUnicodeClassic(fileName);

    }

    // Java 7 - Files.newBufferedReader(path, StandardCharsets.UTF_8)
    // Java 8 - Files.newBufferedReader(path) // default UTF-8
    public static void readUnicodeBufferedReader(String fileName) {

        Path path = Paths.get(fileName);

        // Java 8, default UTF-8
        try (BufferedReader reader = Files.newBufferedReader(path)) {

            String str;
            while ((str = reader.readLine()) != null) {
                System.out.println(str);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    public static void readUnicodeFiles(String fileName) {

        Path path = Paths.get(fileName);
        try {

            // Java 11
            String s = Files.readString(path, StandardCharsets.UTF_8);
            System.out.println(s);

            // Java 8
            List<String> list = Files.readAllLines(path, StandardCharsets.UTF_8);
            list.forEach(System.out::println);

            // Java 8
            Stream<String> lines = Files.lines(path, StandardCharsets.UTF_8);
            lines.forEach(System.out::println);

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    // Java 11, adds charset to FileReader
    public static void readUnicodeJava11(String fileName) {

        Path path = Paths.get(fileName);

        try (FileReader fr = new FileReader(fileName, StandardCharsets.UTF_8);
             BufferedReader reader = new BufferedReader(fr)) {

            String str;
            while ((str = reader.readLine()) != null) {
                System.out.println(str);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    public static void readUnicodeClassic(String fileName) {

        File file = new File(fileName);

        try (FileInputStream fis = new FileInputStream(file);
             InputStreamReader isr = new InputStreamReader(fis, StandardCharsets.UTF_8);
             BufferedReader reader = new BufferedReader(isr)
        ) {

            String str;
            while ((str = reader.readLine()) != null) {
                System.out.println(str);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }

    }
}

Output

Terminal

line 1
line 2
line 3
你好,世界

Download Source Code

$ git clone https://github.com/mkyong/core-java

$ cd java-io

References

mkyong

Founder of Mkyong.com, passionate Java and open-source technologies. If you enjoy my tutorials, consider making a donation to these charities.

16 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
Dipak
9 years ago

Thanks Man.. this fixed my issue. I am grateful.

zik
12 years ago

According to http://docs.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html

maybe you should specify the encoding name to be “UTF-8” rather than “UTF8”?

jessie
10 months ago

Thanks a lot, you really saved me a load of time.
I had a problem with a simple config text file which didn’t have any BOM. It contained ASCII text and a 16 character password with non-ASCII characters (>127), a string like “Ahsy5&ç@2AjS3”.
This file is read perfectly under Windows (and the code works) but not under Linux where the length of the string resulted longer by 1 character.
So I thought I cound use .readUTF() instead of .readLine() but it didn’t work [ and WTH is “modified UTF”? That’s what it said in the source code for .readUTF() ]
Your code worked first go. Keep it up man!

Fabricio
7 years ago

thank you

Julio
8 years ago

thanks!!, this tutorial helped with my problem.

Jagat
10 years ago

Hey,
I am getting only sybmols from the extracted gzip file.
Can anyone helps me?

Anand Kadam
10 years ago
Reply to  Jagat

go to the project properties and set “Text encoding file” as UTF8.

sameer j
10 years ago

“copywrite symbol” get converted to “question mark inside blackdiamond “

Omer
11 years ago

You are amazing thank you mr mkyong

Raghavendra
13 years ago

Hi Mkyong, How to get the encoding characterset of a file in java? Please provide the source code for this. And is UTF-8-> ANSI ?

varun bhatia
13 years ago

Does not work for me! I am getting a ? prefixed to the first line.

Launcher Go
14 years ago

I just want to ask why you put a lot of catch statements wherein you already put a generalized catch statement at the bottom?

sorry, I am still a newbie~

Anthony
11 years ago
Reply to  Launcher Go

to catch any other exception that could be trowed without being catched by the other catch statements

sridhar
15 years ago

this is a real stupidity that i can’t post UTF-8 string to explain the problem in code

reddy
8 years ago

i dint get utf abbrivation