How make InputStreamReader fail on invalid data for encoding?
Asked Answered
C

2

7

I have some bytes which should be UTF-8 encoded, but which may contain a text is ISO8859-1 encoding, if the user somehow didn't manage to use his text editor the right way.

I read the file with an InputStreamReader:

InputStreamReader reader = new InputStreamReader( 
    new FileInputStream(file), Charset.forName("UTF-8"));

But every time the user uses umlauts like "ä", which are invalid UTF-8 when stored in ISO8859-1 the InputStreamReader does not complain but adds placeholder characters.

Is there is simple way to make this throw an Exception on invalid input?

Cumulonimbus answered 5/2, 2013 at 7:26 Comment(0)
P
7
CharsetDecoder decoder = Charset.forName("UTF-8").newDecoder();
decoder.onMalformedInput(CodingErrorAction.REPORT);
decoder.onUnmappableCharacter(CodingErrorAction.REPORT);
InputStreamReader reader = new InputStreamReader(
    new FileInputStream(file), decoder);
Passacaglia answered 5/2, 2013 at 7:31 Comment(1)
Thanks! Didn't know there was an API to do this.Cumulonimbus
S
1

Simply add .newDecoder():

InputStreamReader reader = new InputStreamReader( 
    new FileInputStream(file), Charset.forName("UTF-8").newDecoder());
Shew answered 5/2, 2013 at 10:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.