SAX Error – MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.

Problem

When some special UTF-8 characters inside a XML file, and your SAX’s parser is not configure to parse the UTF-8 properly, the following exception will be thrown.


com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: 
Invalid byte 1 of 1-byte UTF-8 sequence.
...

Solution

The solution is quite simple, get the content in UTF-8 format, and override the SAX input source.


File file = new File("c:\\file-utf.xml");
InputStream inputStream= new FileInputStream(file);
Reader reader = new InputStreamReader(inputStream,"UTF-8");
 
InputSource is = new InputSource(reader);
is.setEncoding("UTF-8");

saxParser.parse(is, handler);

You can read the full example here – how do read UTF-8 XML file with SAX parser

About the Author

author image
mkyong
Founder of Mkyong.com, love Java and open source stuff. Follow him on Twitter. If you like my tutorials, consider make a donation to these charities.

Comments

avatar
18 Comment threads
0 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
14 Comment authors
????? ???????MADHUSUDHANA A BMurodhonJava XML Tutorial | Dev SolverOladeji Oluwasayo Recent comment authors
newest oldest most voted
trackback
How to read UTF-8 XML file in Java – (SAX Parser) | Java

[…] you used normal SAX’s way to parse it, you may encounter this “Invalid byte 1 of 1-byte UTF-8 sequence” […]

trackback
Java XML Tutorials | Tutorials

[…] SAX Error – MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence Common SAX error for XML file contains Unicode character. […]

F3ca
Guest
F3ca

Thanks, great and quick help!

TiredEngineer
Guest
TiredEngineer

Thanks! This helped me resolve my issue.

Varsha
Guest
Varsha

Thanks !! Helped a lot !!

TalHA
Guest
TalHA

Thanks, helped me

beans9
Guest
beans9

hi~
if i use url ,not file
how to change charset to UTF-8 ?
lol

trackback
Comment lire un fichier XML UTF-8 en Java – (Parser SAX) | sourcetutorial.com

[…] En Java, il n’y a pas de problème si vous utilisez SAX pour parser un  fichier XML (ANSI), cependant, si vous parser un fichier XML qui contient un peu  des caractères spéciaux UTF-8, il génère  l’exception “Invalid byte 1 of 1-byte UTF-8 sequence”. […]

Alan A. D. de Oliveira
Guest
Alan A. D. de Oliveira

Thanks. It help me a lot!!!

Yuchen
Guest
Yuchen

I have a question. How to test if a string is a valid UTF-8 in Java? I know how to detect whether byte[] is UTF-8 but I have no idea about how to detect whether a java string is valid UTF-8?

Thanks

vishal solankee
Guest
vishal solankee

Thanks a lot for providing such hands-on!!

Shaik Allabakash
Guest
Shaik Allabakash

Awesome buddy….It worked like magic…Many results came while looking for this in the search engines but most of them suggested to modify the source file. Yours is the only one that provided the right solution.

Sanduckhan
Guest
Sanduckhan

Thanks, you saved my day !

Oladeji Oluwasayo
Guest
Oladeji Oluwasayo

I use your articles a lot in my not-so-JavaEE development. You’ve been an extremely valuable resource!

trackback
Java XML Tutorial | Dev Solver

[…] SAX Error – Invalid byte 1 of 1-byte UTF-8 sequence Common SAX error for XML file which contains Unicode character. […]

Murodhon
Guest
Murodhon

Thanks, This helped me resolve my issue to. mkyong.com is great and very useful

MADHUSUDHANA A B
Guest
MADHUSUDHANA A B

thank u very much.ur code working nicely…..

????? ???????
Guest
????? ???????

Excellent solution!