How to enable non-IANA encodings when using javax.xml.stream.XMLStreamReader
Asked Answered
O

1

8

I'm using javax.xml.stream.XMLStreamReader to parse XML documents. Unfortunately, some of the documents I'm parsing use non-IANA encoding names, like "macroman" and "ms-ansi". For example:

<?xml version="1.0" encoding="macroman"?>
<foo />

This causes the parse to blow up with an exception:

javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,42]
Message: Invalid encoding name "macroman".

Is there any way to provide a custom encoding handler to my XMLStreamReader so that I can augment it with support for the encodings I need??

Ossetic answered 7/6, 2018 at 22:2 Comment(3)
I'm assuming you don't have the ability to alter the stream so that it doesn't contain the encoding line? XMLStreamReader has its limitations, and this is one of them.Brodsky
Its unfortunate, but you may better be served by choosing a different XML library.Brodsky
@Brodsky I'm not producing these documents, just consuming them, so I have no control over the encoding line unfortunately. Are there other XML libraries that are more flexible?Ossetic
D
0

You could wrap the input stream with a transformer that replaces the non-standard charset with the equivalent charset that XMLStreamReader does understand.

See Filter (search and replace) array of bytes in an InputStream

Dort answered 17/3, 2019 at 20:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.