How to read/write a file in UTF-8 in Java?
Asked Answered
H

2

19

I am getting this error:

io.MalformedByteSequenceException: Invalid byte 2 of 2-byte UTF-8 sequence

The solution is to read and write a file in UTF-8.

My code is:

InputStream input = null;
OutputStream output = null;
OutputStreamWriter bufferedWriter = new OutputStreamWriter( output, "UTF8");
input = new URL(url).openStream();
output = new FileOutputStream("DirectionResponse.xml");
byte[] buffer = new byte[1024];
for (int length = 0; (length = input.read(buffer)) > 0;) {
   output.write(buffer, 0, length);
}
BufferedReader br = new BufferedReader(new FileReader("DirectionResponse.xml" ));
FileWriter fstream = new FileWriter("ppre_DirectionResponse.xml");
BufferedWriter out = new BufferedWriter(fstream);

I'm reading a url and writing it to a file DirectionResponse.xml. Then reading DirectionResponse.xml and writing the same as ppre_DirecionResponse.xml for processing.

How do I change this so that reading and writing is done in UTF-8?

Haruspex answered 12/11, 2012 at 20:2 Comment(0)
S
42

First, you need to call output.close() (or at least call output.flush()) before you reopen the file for input. That's probably the main cause of your problems.

Then, you shouldn't use FileReader or FileWriter for this because it always uses the platform-default encoding (which is often not UTF-8). From the docs for FileReader:

The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate.

You have the same problem when using a FileWriter. Replace this:

BufferedReader br = new BufferedReader(new FileReader("DirectionResponse.xml" ));

with something like this:

BufferedReader br = new BufferedReader(new InputStreamReader(
    new FileInputStream("DirectionResponse.xml"), "UTF-8"));

and similarly for fstream.

Slue answered 12/11, 2012 at 20:9 Comment(5)
@Aubin - Sure, at least if you're talking about the input stream: URLConnection conn = url.openConnection(); InputStream is = conn.getInputStream();. Then use is as the input stream.Slue
@user905911 - I noted another problem with your code. See the first paragraph of my revised answer.Slue
@TedHopp : Sir, it worked. But i dont no how to change FileWriter. Should i change to OutputStreamWriter ?Haruspex
ok so i changed that to FileOutputStream fos = new FileOutputStream("ppre_DirectionResponse.xml"); Writer out = new OutputStreamWriter(fos, "UTF8");Haruspex
Since Java 7 it could be done in a shorter way: BufferedReader br = Files.newBufferedReader(Paths.get("DirectionResponse.xml"), StandardCharsets.UTF_8);Mvd
K
2

Read and Write UTF-8 File in Java

I see you are writing in utf-8 but not specifically reading in utf-8. Follow the example I've provided in the link.

try {
   Reader reader =
      new InputStreamReader(
         new FileInputStream(args[0]),"UTF-8");
   BufferedReader fin = new BufferedReader(reader);
   Writer writer =
      new OutputStreamWriter(
         new FileOutputStream(args[1]), "UTF-8");
   BufferedWriter fout = new BufferedWriter(writer);
   String s;
   while ((s=fin.readLine())!=null) {
      fout.write(s);
      fout.newLine();
   }

            //Remember to call close. 
            //calling close on a BufferedReader/BufferedWriter 
            // will automatically call close on its underlying stream 
   fin.close();
   fout.close();
} catch (IOException e) {
   e.printStackTrace();
}
Karyogamy answered 12/11, 2012 at 20:6 Comment(3)
i read that but the problem is tht i need to read a URL, those functions dont read url.Haruspex
@user905911 u dindn't specify that in your questionKaryogamy
fin.close(); and fout.close(); must b called in a finally block. And since Java 7 it's better to use try-with-resources approach.Mvd

© 2022 - 2024 — McMap. All rights reserved.