How to read and parse CSV file in Java

country-csv-example

A Comma-Separated Values (CSV) file is just a normal plain-text file, store data in column by column, and split it by a separator (e.g normally it is a comma “,”).

For example :

1,US,United States
2,MY,Malaysia
3,AU,Australia

or

"1","US","United States"
"2","MY","Malaysia"
"3","AU","Australia"
Note
Read this RFC4180 document for Comma-Separated Values (CSV) format.

In a CSV file, normally there are two issues:

1. The field containing separator, for example, separator is a comma, and the field containing commas:

"aaa","b,bb","ccc"

2. The double-quotes are used to enclose fields, and the field containing double-quotes. To solve it, a double-quote appearing inside a field must be escaped by preceding it with another double quote (RFC4180)

"aaa","b""bb","ccc"

In this tutorial, we show you three examples to read, parse and print out the values from a CSV file.

  1. Simple solution to parse a simple formatted CSV file.
  2. Advance solution to parse a weird formatted CSV file (field containing separator or double-quotes)
  3. Third party solution, OpenCSV example.

1. Simple Solution

If you are sure the CSV files doesn’t contain “separator or double-quotes”, just use the standard split() to parse the CSV file.

1.1 Review a simple CSV file

/Users/mkyong/csv/country.csv
"1.0.0.0","1.0.0.255","16777216","16777471","AU","Australia"
"1.0.1.0","1.0.3.255","16777472","16778239","CN","China"
"1.0.4.0","1.0.7.255","16778240","16779263","AU","Australia"
"1.0.8.0","1.0.15.255","16779264","16781311","CN","China"
"1.0.16.0","1.0.31.255","16781312","16785407","JP","Japan"
"1.0.32.0","1.0.63.255","16785408","16793599","CN","China"
"1.0.64.0","1.0.127.255","16793600","16809983","JP","Japan"
"1.0.128.0","1.0.255.255","16809984","16842751","TH","Thailand"

1.2 No magic, just read above text file, and splits it by a comma separator.

CSVReader.csv

package com.mkyong.csv;

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;

public class CSVReader {

    public static void main(String[] args) {

        String csvFile = "/Users/mkyong/csv/country.csv";
        BufferedReader br = null;
        String line = "";
        String cvsSplitBy = ",";

        try {

            br = new BufferedReader(new FileReader(csvFile));
            while ((line = br.readLine()) != null) {

                // use comma as separator
                String[] country = line.split(cvsSplitBy);

                System.out.println("Country [code= " + country[4] + " , name=" + country[5] + "]");

            }

        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (br != null) {
                try {
                    br.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }

    }

}

1.3 For JDK 7 and above, uses try-resources.

CSVReader.csv

package com.mkyong.csv;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class CSVReader {

    public static void main(String[] args) {

        String csvFile = "/Users/mkyong/csv/country.csv";
        String line = "";
        String cvsSplitBy = ",";

        try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {

            while ((line = br.readLine()) != null) {

                // use comma as separator
                String[] country = line.split(cvsSplitBy);

                System.out.println("Country [code= " + country[4] + " , name=" + country[5] + "]");

            }

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

}

Output

Country [code= "AU" , name="Australia"]
Country [code= "CN" , name="China"]
Country [code= "AU" , name="Australia"]
Country [code= "CN" , name="China"]
Country [code= "JP" , name="Japan"]
Country [code= "CN" , name="China"]
Country [code= "JP" , name="Japan"]
Country [code= "TH" , name="Thailand"]

2. Advance Solution

This solution will solve the field containing “separator or double-quotes” issue, and also support the custom separator and custom enclosed field. Review the following CSV parsing example and also the JUnit test cases to understand how it works.

Note
Again, if a double-quote appearing inside a field must be escaped by preceding it with another double quote, for example

"aaa","b""bb","ccc"

2.1 Review another CSV file

/Users/mkyong/csv/country2.csv
10,AU,Australia
11,AU,Aus""tralia
"12","AU","Australia"
"13","AU","Aus""tralia"
"14","AU","Aus,tralia"

2.2 The below example is inspired by this article – “Only Class You Need for CSV Files” (with some fixed to support other features, read “fixed comments”) and also this third party OpenCSV library.

CSVUtils.java

package com.mkyong.utils;

import java.io.File;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;

public class CSVUtils {

    private static final char DEFAULT_SEPARATOR = ',';
    private static final char DEFAULT_QUOTE = '"';

    public static void main(String[] args) throws Exception {

        String csvFile = "/Users/mkyong/csv/country2.csv";

        Scanner scanner = new Scanner(new File(csvFile));
        while (scanner.hasNext()) {
            List<String> line = parseLine(scanner.nextLine());
            System.out.println("Country [id= " + line.get(0) + ", code= " + line.get(1) + " , name=" + line.get(2) + "]");
        }
        scanner.close();

    }

    public static List<String> parseLine(String cvsLine) {
        return parseLine(cvsLine, DEFAULT_SEPARATOR, DEFAULT_QUOTE);
    }

    public static List<String> parseLine(String cvsLine, char separators) {
        return parseLine(cvsLine, separators, DEFAULT_QUOTE);
    }

    public static List<String> parseLine(String cvsLine, char separators, char customQuote) {

        List<String> result = new ArrayList<>();

        //if empty, return!
        if (cvsLine == null && cvsLine.isEmpty()) {
            return result;
        }

        if (customQuote == ' ') {
            customQuote = DEFAULT_QUOTE;
        }

        if (separators == ' ') {
            separators = DEFAULT_SEPARATOR;
        }

        StringBuffer curVal = new StringBuffer();
        boolean inQuotes = false;
        boolean startCollectChar = false;
        boolean doubleQuotesInColumn = false;

        char[] chars = cvsLine.toCharArray();

        for (char ch : chars) {

            if (inQuotes) {
                startCollectChar = true;
                if (ch == customQuote) {
                    inQuotes = false;
                    doubleQuotesInColumn = false;
                } else {

                    //Fixed : allow "" in custom quote enclosed
                    if (ch == '\"') {
                        if (!doubleQuotesInColumn) {
                            curVal.append(ch);
                            doubleQuotesInColumn = true;
                        }
                    } else {
                        curVal.append(ch);
                    }

                }
            } else {
                if (ch == customQuote) {

                    inQuotes = true;

                    //Fixed : allow "" in empty quote enclosed
                    if (chars[0] != '"' && customQuote == '\"') {
                        curVal.append('"');
                    }

                    //double quotes in column will hit this!
                    if (startCollectChar) {
                        curVal.append('"');
                    }

                } else if (ch == separators) {

                    result.add(curVal.toString());

                    curVal = new StringBuffer();
                    startCollectChar = false;

                } else if (ch == '\r') {
                    //ignore LF characters
                    continue;
                } else if (ch == '\n') {
                    //the end, break!
                    break;
                } else {
                    curVal.append(ch);
                }
            }

        }

        result.add(curVal.toString());

        return result;
    }

}

Output

Country [id= 10, code= AU , name=Australia]
Country [id= 11, code= AU , name=Aus"tralia]
Country [id= 12, code= AU , name=Australia]
Country [id= 13, code= AU , name=Aus"tralia]
Country [id= 14, code= AU , name=Aus,tralia]

3.3 Review the following unit test, it test the “commas and double quotes” issue.

CSVUtilsTest.java

package com.mkyong.csv;

import com.mkyong.utils.CSVUtils;
import org.hamcrest.core.IsNull;
import org.junit.Test;

import java.util.List;

import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.core.Is.is;

public class CSVUtilsTest {

    @Test
    public void test_no_quote() {

        String line = "10,AU,Australia";
        List<String> result = CSVUtils.parseLine(line);

        assertThat(result, IsNull.notNullValue());
        assertThat(result.size(), is(3));
        assertThat(result.get(0), is("10"));
        assertThat(result.get(1), is("AU"));
        assertThat(result.get(2), is("Australia"));

    }

    @Test
    public void test_no_quote_but_double_quotes_in_column() throws Exception {

        String line = "10,AU,Aus\"\"tralia";

        List<String> result = CSVUtils.parseLine(line);
        assertThat(result, IsNull.notNullValue());
        assertThat(result.size(), is(3));
        assertThat(result.get(0), is("10"));
        assertThat(result.get(1), is("AU"));
        assertThat(result.get(2), is("Aus\"tralia"));

    }

    @Test
    public void test_double_quotes() {

        String line = "\"10\",\"AU\",\"Australia\"";
        List<String> result = CSVUtils.parseLine(line);

        assertThat(result, IsNull.notNullValue());
        assertThat(result.size(), is(3));
        assertThat(result.get(0), is("10"));
        assertThat(result.get(1), is("AU"));
        assertThat(result.get(2), is("Australia"));

    }

    @Test
    public void test_double_quotes_but_double_quotes_in_column() {

        String line = "\"10\",\"AU\",\"Aus\"\"tralia\"";
        List<String> result = CSVUtils.parseLine(line);

        assertThat(result, IsNull.notNullValue());
        assertThat(result.size(), is(3));
        assertThat(result.get(0), is("10"));
        assertThat(result.get(1), is("AU"));
        assertThat(result.get(2), is("Aus\"tralia"));

    }

    @Test
    public void test_double_quotes_but_comma_in_column() {

        String line = "\"10\",\"AU\",\"Aus,tralia\"";
        List<String> result = CSVUtils.parseLine(line);

        assertThat(result, IsNull.notNullValue());
        assertThat(result.size(), is(3));
        assertThat(result.get(0), is("10"));
        assertThat(result.get(1), is("AU"));
        assertThat(result.get(2), is("Aus,tralia"));

    }

}

3.4 Review another unit test, it test the custom separator and custom enclosed field.

CSVUtilsTestCustom.java

package com.mkyong.csv;

import com.mkyong.utils.CSVUtils;
import org.hamcrest.core.IsNull;
import org.junit.Test;

import java.util.List;

import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.core.Is.is;

public class CSVUtilsTestCustom {

    @Test
    public void test_custom_separator() {

        String line = "10|AU|Australia";
        List<String> result = CSVUtils.parseLine(line, '|');

        assertThat(result, IsNull.notNullValue());
        assertThat(result.size(), is(3));
        assertThat(result.get(0), is("10"));
        assertThat(result.get(1), is("AU"));
        assertThat(result.get(2), is("Australia"));

    }

    @Test
    public void test_custom_separator_and_quote() {

        String line = "'10'|'AU'|'Australia'";
        List<String> result = CSVUtils.parseLine(line, '|', '\'');

        assertThat(result, IsNull.notNullValue());
        assertThat(result.size(), is(3));
        assertThat(result.get(0), is("10"));
        assertThat(result.get(1), is("AU"));
        assertThat(result.get(2), is("Australia"));

    }

    @Test
    public void test_custom_separator_and_quote_but_custom_quote_in_column() {

        String line = "'10'|'AU'|'Aus|tralia'";
        List<String> result = CSVUtils.parseLine(line, '|', '\'');

        assertThat(result, IsNull.notNullValue());
        assertThat(result.size(), is(3));
        assertThat(result.get(0), is("10"));
        assertThat(result.get(1), is("AU"));
        assertThat(result.get(2), is("Aus|tralia"));

    }

    @Test
    public void test_custom_separator_and_quote_but_double_quotes_in_column() {

        String line = "'10'|'AU'|'Aus\"\"tralia'";
        List<String> result = CSVUtils.parseLine(line, '|', '\'');

        assertThat(result, IsNull.notNullValue());
        assertThat(result.size(), is(3));
        assertThat(result.get(0), is("10"));
        assertThat(result.get(1), is("AU"));
        assertThat(result.get(2), is("Aus\"tralia"));

    }

}

3. OpenCSV Example

If you are not comfortable with above simple and advance solution, try using third party CSV library – OpenCSV.

3.1 Maven.

pom.xml

	<dependency>
		<groupId>com.opencsv</groupId>
		<artifactId>opencsv</artifactId>
		<version>3.8</version>
	</dependency>

3.2 Review a CSV file.

/Users/mkyong/csv/country3.csv
10,AU,Australia
11,AU,Aus""tralia
"12","AU","Australia"
"13","AU","Aus""tralia"
"14","AU","Aus,tralia"

3.2 OpenCSV example to parse above CSV file.

CSVReaderExample.java

package com.mkyong.csv;

import com.opencsv.CSVReader;

import java.io.FileReader;
import java.io.IOException;

public class CSVReaderExample {

    public static void main(String[] args) {

        String csvFile = "/Users/mkyong/csv/country3.csv";

        CSVReader reader = null;
        try {
            reader = new CSVReader(new FileReader(csvFile));
            String[] line;
            while ((line = reader.readNext()) != null) {
                System.out.println("Country [id= " + line[0] + ", code= " + line[1] + " , name=" + line[2] + "]");
            }
        } catch (IOException e) {
            e.printStackTrace();
        }


    }

}

Output

Country [id= 10, code= AU , name=Australia]
Country [id= 11, code= AU , name=Aus"tralia]
Country [id= 12, code= AU , name=Australia]
Country [id= 13, code= AU , name=Aus"tralia]
Country [id= 14, code= AU , name=Aus,tralia]
Note
Please refer to this OpenCSV official documentation for more examples.

Done.

References

  1. The Only Class You Need for CSV Files
  2. CSVHelper Example
  3. Ostermiller Java Utilities – Comma Separated Values (CSV)
  4. RFC4180 – Format for Comma-Separated Values (CSV)
  5. OpenCSV website
  6. Java – How to export data to CSV file

About the Author

author image
mkyong
Founder of Mkyong.com, love Java and open source stuff. Follow him on Twitter. If you like my tutorials, consider make a donation to these charities.

Comments

avatar
55 Comment threads
19 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
63 Comment authors
Adam EricksonSebastianHsiweiSourovNaresh Recent comment authors
newest oldest most voted
ziggyfish
Guest
ziggyfish

Hi,

You have an error in your code:

//if empty, return!

if (cvsLine == null && cvsLine.isEmpty()) {

return result;

}

should be:

//if empty, return!

if (cvsLine == null || cvsLine.isEmpty()) {

return result;

}

Ömer Hakan Bilici
Guest
Ömer Hakan Bilici

Thanks. Note that OpenCSV Example part is enough for most cases. I think you should move this part to start of this post.

Apercova
Guest
Apercova

HI!. I have test case that is not cover by your advanced solution: I have the following csv line: String line = “10,”AU”,”Aus””tralia””; Your solution outputs the next: 10, “AU, “Aus””tralia You can se that a quote is appended to the value AU and to Aus””tralia this behavior is caused in this part: //Fixed : allow “” in empty quote enclosed if (chars[0] != ‘”‘ && customQuote == ‘”‘) { curVal.append(‘”‘); } I wonder what scenario is this fix for, because if I comment this code the csv line is parsed fine.

JacksonImplementer
Guest
JacksonImplementer

Hallo Mkyong,

Your articles are really interesting to read every time. Will the CSV parser work if my object has object as member variable?.

for example
public class {
int modelNumber;
String modelName;
CarDefination carDefin;
}

what happens if there are Object as member variable?

code date
Guest
code date

package take; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.ArrayList; import java.util.Date; import java.util.List; public class DateUtil { // List of all date formats that we want to parse. // Add your own format here. private static List dateFormats = new ArrayList() { private static final long serialVersionUID = 1L; { add(new SimpleDateFormat(“M/dd/yyyy”)); add(new SimpleDateFormat(“dd.M.yyyy”)); add(new SimpleDateFormat(“M/dd/yyyy hh:mm:ss a”)); add(new SimpleDateFormat(“dd.M.yyyy hh:mm:ss a”)); add(new SimpleDateFormat(“dd.MMM.yyyy”)); add(new SimpleDateFormat(“dd-MMM-yyyy”)); } }; /** * Convert String with various formats into java.util.Date * * @param input * Date as a string * @return java.util.Date object if input string is parsed * successfully else returns null */… Read more »

nickelen
Guest
nickelen

Hi, mkyong,

I think it’s better to use

String[] country = line.split(cvsSplitBy, -1);

because in many cases, csv file contains one line with empty value at the end just like:

“1.0.64.0”,”1.0.127.255″,”16793600″,”16809983″,”JP”,”Japan”,,,,,

line.split(cvsSplitBy) will get an array length = 6, but most time we expect to get correct array which length is containing empty values at the end.

What do you think ?

mkyong
Guest
mkyong

Thanks for your input. Yes, your way is better.

GiladD
Guest
GiladD

Great article. Thanks! However, there is an error in the following line in your CSVUtils:
if (cvsLine == null && cvsLine.isEmpty()) {
The two conditions can’t co-exist. If csvLine is null then isEmpty will throw a NullPointerException.
It should be:
if (cvsLine == null || cvsLine.isEmpty()) {

Works great beside that!

Hsiwei
Guest
Hsiwei

Thanks.
It’s useful.

mahfuz
Guest
mahfuz

i have used opencsv. Opencsv can read the whole cav in memory to do a faster operation. Also writing csv is pretty easy using that, just google opencsv.
Happy Coding.

milczarekIT
Guest
milczarekIT

I also recommend opencsv library. It allow for mapping csv columns to class properties, (de)serializaion with this simple lib is pretty easy.

zynick
Guest
zynick

Note that it might not work on csv file that contains comma in the data, for example:

“Nicholas”,”Programmer, Gamer, Dancer”
“Victor”,”Actor, Director, Producer”

I believe you are aware of this, perhaps just wanna keep things simple, since it wouldn’t be too appropriate for data with comma to store in a csv file.

Alex
Guest
Alex

Indeed, but also numbers in EU format (with commas instead of dots) can cause problems, that’s why I usually take the ; as separator, you probably never see it in your data itself so it won’t mess your file up.

Burhan KILINC
Guest
Burhan KILINC

why not split by “,” (three chars)

Evan James Knowles
Guest
Evan James Knowles

Annoyingly, having the ” characters isn’t required.

Viswa
Guest
Viswa

what is the solution for issue…
Nicholas”,”Programmer, Gamer, Dancer”
to treat as 3 values instead of 4 values.

mkyong
Guest
mkyong

Article is updated to support field containing separator (comma), try #2 advance solution.

Oliver
Guest
Oliver

If you use comma as a separator. I think this will fail if the the string contains comma.
E.g “US”,”United States of America, NY”

hxl
Guest
hxl

Extracts by use regex

ilen
Guest
ilen

it’s very helpfull, thanks a lot.

Anonymous
Guest
Aldo
Guest
Aldo

Thanks for all the “how-to”‘s, i’ve use them many times and they have been really helpful. Regards from Chile.

Scrat
Guest
Scrat

I think you can also use awk to do it…

Girish Sharma
Guest
Girish Sharma

Good One You really Helps Me alot Keep sharing these kind of sample.

emeric
Guest
emeric

“Get a country CVS file”…
…”name from above CVS file”…
“ReadCVS.java”

The file type is “CSV” and not “CVS”. 🙂

And thanks for this helpful article!

Rahman Habibi
Guest
Rahman Habibi

CSV is have other value like this

value1,”value,with,commas”,other,value

and

value,””value with double quotes””,other,value

i think this code must select condition when line have values with commas and values with double quotes, cause every value with double quotes and commas is wedged with double quotes too.

neeru
Guest
neeru

very nice

Patrick McSweeney
Guest
Patrick McSweeney

This doesnt parse CSV at all. It parses some files which look like CSV but it makes a series of really obvious mistakes so this wont work. It doesnt take into account commas in the data. It assumes 1 line per record which according to http://tools.ietf.org/html/rfc4180 is not necessarily the case. Don’t parse CSV like this. Use a proper library

johnny
Guest
johnny

use a proper library such as ?
Is there any link or example of how we can use such libraries?

thanks, regards

Arun
Guest
Arun

You could use csv_ml (http://siara.cc/csv_ml). is RFC4180 compliant, allows empty lines and comments, can convert to JSON and XML, supports stream parsing for huge files and many more features.

Download Swing demo: http://siara.cc/csv_ml/csv_ml_swing_demo-1.0.0.jar

Online Javascript demo: http://siara.cc/csv_ml/csv_ml_js.html

Download source: https://github.com/siara-cc/csv_ml

mkyong
Guest
mkyong

For a simple CSV file, the #1 example simple solution is enough. For other, try #2 advance solution or third library like OpenCSV

Anon
Guest
Anon

Be careful with split

System.out.println(Arrays.asList(“a,,”.split(“,”)))

will return {“a”} instead of {“a”, “”, “”} !!!

izeye
Guest
izeye

What if there’s a comma between double quotes (“) like “123,000” in CSV?

Simply splitting by comma (‘) doesn’t work in this case.

paragflume
Guest
paragflume

Hi mkyong, Suppose I have 2 fields (1) Integer and (2) String-Date eg 23,”September 30, 2012″ 24,”September 31, 2012″ 24,”September 32, 2012″ 25,”September 33, 2012″ Now I want to parse by comma “,”. Expected result is to have two fields but the actual result will be 3 fields. Can you help me to solve the problem

Arundev Puthenpurayil
Guest
Arundev Puthenpurayil

Try out this………..

public static void main(String[] args) {
String s = “Sachin,,M,”Maths,Science,English”,Need to improve in these subjects.”;
String[] splitted = s.split(“,(?=([^”]*”[^”]*”)*[^”]*$)”);
System.out.println(Arrays.toString(splitted));
}

Moldovan Adrian
Guest
Moldovan Adrian
mkyong
Guest
mkyong

Article is updated to support this feature, try example 2, advance solution.

kalpeshsoni
Guest
kalpeshsoni

check out opencsv and super csv

Pradip
Guest
Pradip

Nice.. Helpful code

neha agrawal
Guest
neha agrawal

I think we should use Scanner class instead of split() method.

Jason Flaherty
Guest
Jason Flaherty

What if one of the CSV columns was empty? How would you handle that? You would get an error, but can you fill it with a “N/A” for example?

Anand Kadekar
Guest
Anand Kadekar

HI mkyong,
This works pretty well with small files. But if file size is beyond 100MB this will probably not work.
I came across the situation where I need to compare two very large csv files. First I tried with java but then settled with Hadoop.

hemal
Guest
hemal

//code for readung file private static void readLineByLineExample() throws IOException { BufferedReader CSVFile = new BufferedReader(new FileReader(“C:\Users\NTANWAR\Desktop\std_db.csv”)); String dataRow = CSVFile.readLine(); // Read first line. // The while checks to see if the data is null. If // it is, we’ve hit the end of the file. If not, // process the data. while (dataRow != null) { String[] dataArray = dataRow.split(“,”); for (String item:dataArray) { System.out.print(item + “t”); } System.out.println(); // Print the data line. dataRow = CSVFile.readLine(); // Read next line of data. } // Close the file once all data has been read. CSVFile.close(); // End the… Read more »