StAX - Setting the version and encoding using XMLStreamWriter
Asked Answered
D

4

7

I am using StAX for creating XML files and then validating the file with and XSD.

I am getting an error while creating the XML file:

javax.xml.stream.XMLStreamException: Underlying stream encoding 'Cp1252' and input paramter for writeStartDocument() method 'UTF-8' do not match.
        at com.sun.xml.internal.stream.writers.XMLStreamWriterImpl.writeStartDocument(XMLStreamWriterImpl.java:1182)

Here is the code snippet:

XMLOutputFactory xof =  XMLOutputFactory.newInstance();

try{

  XMLStreamWriter xtw = xof.createXMLStreamWriter(new FileWriter(fileName));
  xtw.writeStartDocument("UTF-8","1.0");} catch(XMLStreamException e) {
  e.printStackTrace();

} catch(IOException ie) {

  ie.printStackTrace();

}

I am running this code on Unix. Does anybody know how to set the version and encoding style?

Dennadennard answered 31/5, 2010 at 12:45 Comment(0)
I
14

I would try to use the createXMLStreamWriter() with an output parameter too.

[EDIT] Tried, it works by changing the createXMLStreamWriter line:

XMLStreamWriter xtw = xof.createXMLStreamWriter(new FileOutputStream(fileName), "UTF-8");

[EDIT 2] Made a little more complex test, for the record:

String fileName = "Test.xml";
XMLOutputFactory xof =  XMLOutputFactory.newInstance();
XMLStreamWriter xtw = null;
try
{
  xtw = xof.createXMLStreamWriter(new FileOutputStream(fileName), "UTF-8");
  xtw.writeStartDocument("UTF-8", "1.0");
  xtw.writeStartElement("root");
  xtw.writeComment("This is an attempt to create an XML file with StAX");

  xtw.writeStartElement("foo");
  xtw.writeAttribute("order", "1");
    xtw.writeStartElement("meuh");
    xtw.writeAttribute("active", "true");
      xtw.writeCharacters("The cows are flying high this Spring");
    xtw.writeEndElement();
  xtw.writeEndElement();

  xtw.writeStartElement("bar");
  xtw.writeAttribute("order", "2");
    xtw.writeStartElement("tcho");
    xtw.writeAttribute("kola", "K");
      xtw.writeCharacters("Content of tcho tag");
    xtw.writeEndElement();
  xtw.writeEndElement();

  xtw.writeEndElement();
  xtw.writeEndDocument();
}
catch (XMLStreamException e)
{
  e.printStackTrace();
}
catch (IOException ie)
{
  ie.printStackTrace();
}
finally
{
  if (xtw != null)
  {
    try
    {
      xtw.close();
    }
    catch (XMLStreamException e)
    {
      e.printStackTrace();
    }
  }
}
Interrupt answered 31/5, 2010 at 14:11 Comment(7)
@Anurag: I think you should not put a space between @ and the user name: I was not notified of your question. Anyway, being curious, I tried my advice and found a working solution, see my edit.Interrupt
@PhiLho: sorry for that. I am getting another error 'Prefix can not be null'. My schema does not use any prefixes. Is there any way to ignore this error. Because of this I am getting blank file.Dennadennard
@PhiLho: I am trying in a same fashion. Still it gives me error Prefix cannot be null.Dennadennard
@Anurag: I don't have any error with the little sample I show, wrapped in a simple class. I don't use a schema or anything else than shown.Interrupt
@PhiLho: This is resolved by using a blank prefix as : xtw.setPrefix("", "w3.org/2001/XMLSchema-instance");Dennadennard
Thanks all your effort you put in to help me out here. Thanks a lot !!Dennadennard
Ditto, this was a big help!Glia
I
7

This should work:

// ...
Writer writer = new OutputStreamWriter(new FileOutputStream(fileName), "UTF-8");
XMLStreamWriter xtw = xof.createXMLStreamWriter(writer);
xtw.writeStartDocument("UTF-8", "1.0");
// ...
Irby answered 1/6, 2010 at 19:46 Comment(1)
It works but you should use StandardCharsets.UTF_8 in the first line and StandardCharsets.UTF_8.name() in the last line instead of hardcoding "UTF-8". Note that it requires at least Java 1.7 (or use Charset.forName("UTF-8")). Thanks.Pisciform
C
3

From the code it is hard to know for sure, but if you are relying on the default Stax implementation that JDK 1.6 provides (Sun sjsxp) I would recommend upgrading to use Woodstox. It is known to be less buggy than Sjsxp, supports the whole Stax2 API and has been actively developed and supported (whereas Sun version was just written and there has been limited number of bug fixes).

But the bug in your code is this:

XMLStreamWriter xtw = xof.createXMLStreamWriter(new FileWriter(fileName));

you are relying on the default platform encoding (which must be CP-1252, windows?). You should always explicitly specify encoding you are using. Stream writer is just verifying that you are not doing something dangerous, and spotted inconsistence that can cause corrupt document. Pretty smart, which actually suggests that this is not the default Stax processor. :-)

(the other answer points a correct workaround, too, by just passing OutputStream and encoding to let XMLStreamWriter do the right thing)

Chui answered 13/1, 2011 at 19:10 Comment(0)
T
0

If using the default XMLStreamWriter bundled with the Oracle JRE/JDK you should always

  • create a XMLStreamWriter, explicitly setting the character encoding: xmlOutputFactory.createXMLStreamWriter(in, encoding)
  • start the document and explicitly setting the encoding: xmlStreamWriter.writeStartDocument(encoding, version). The writer is not smart enough remembering the encoding set when the writer was created. However, it checks if these encodings are the same. See code below.

This way, your file encoding and XML declaration are always in sync. Although specifying the encoding in the XML declaration is optional, XML best practice is to always specify it.

This is the code from the Oracle (Sun) implementation (Sjsxp):

String streamEncoding = null;
if (fWriter instanceof OutputStreamWriter) {
    streamEncoding = ((OutputStreamWriter) fWriter).getEncoding();
}
else if (fWriter instanceof UTF8OutputStreamWriter) {
    streamEncoding = ((UTF8OutputStreamWriter) fWriter).getEncoding();
}
else if (fWriter instanceof XMLWriter) {
    streamEncoding = ((OutputStreamWriter) ((XMLWriter)fWriter).getWriter()).getEncoding();
}

if (streamEncoding != null && !streamEncoding.equalsIgnoreCase(encoding)) {
    // If the equality check failed, check for charset encoding aliases
    boolean foundAlias = false;
    Set aliases = Charset.forName(encoding).aliases();
    for (Iterator it = aliases.iterator(); !foundAlias && it.hasNext(); ) {
        if (streamEncoding.equalsIgnoreCase((String) it.next())) {
            foundAlias = true;
        }
    }
    // If no alias matches the encoding name, then report error
    if (!foundAlias) {
        throw new XMLStreamException("Underlying stream encoding '"
                + streamEncoding
                + "' and input paramter for writeStartDocument() method '"
                + encoding + "' do not match.");
    }
}
Tessellation answered 3/7, 2015 at 18:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.