Why does this code keep triggering the SaxParseException : ""PI must not start with xml"?
Asked Answered
B

6

8

This code is used to generate a XML document from its String representation. It works fine in my small unit tests, but fails in my actual xml data. The line where it triggers is Document doc = db.parse(is);

Any ideas?

public static Document FromString(String xml)
{
    // from http://www.rgagnon.com/javadetails/java-0573.html
    try
    {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        InputSource is = new InputSource();
        is.setCharacterStream(new StringReader(xml));

        Document doc = db.parse(is);
        doc.normalize();

        return doc;
    }
    catch (Exception e)
    {
        Log.WriteError("Failed to parse XML", e, "XML.FromString(String)");
        return null;
    }
}
Brezin answered 14/2, 2011 at 1:33 Comment(6)
It is likely not your code, but the "XML" string that you are loading and attempting to parse. If it isn't XML, then it will throw parse exceptions when it encounters things like elements that are not closed, invalid characters, etc.Islington
What does the exception say? Have you tested your XML against an outside source to make sure it is valid?Gaylene
found this message in the exception: "PI must not start with xml (position:unknown xm@1:5 in java.io.StringReader@4625d540) " Not sure what this means since I'm fairly sure starting the 1st character is <?xml version="1.0" encoding="utf-8"?>Brezin
Typically you get this if you extra whitespace before XML declaration -- this is not allowed; if you do have xml declaration, it MUST start without any leading whitespace. And on the other hand, processing instructions (PI) are not allowed to have target name of "xml", hence error message.Extravert
You can also get this if you read the string from a stream using the wrong encoding.Geostatics
I think there's something in the string before the <?xml. Perhaps a byte order mark?Mikes
B
16

Thanks for your help everyone.

I discarded the <?xml version="1.0" encoding="utf-8"?> which cleared this error. Still don't understand what the reason for this might be, but it worked nonetheless.

I went on to find one of my buffered writers (when extracting from a zip file into memory) wasn't being flushed, which was causing the xml string to be incomplete.

Thanks everyone for your help!

Brezin answered 14/2, 2011 at 12:32 Comment(5)
Hi Kurru, How did you discar the declaration? Did you use a transformer or a substring or what? I've been struggling with this for a while...Lysimeter
Theres a few ways you could do this, I think since I could rely on formatting I just threw away the 1st line of the code. You could also try to substring it according to the 1st > symbolBrezin
Thanks for the reply, I ended up using the substring approach.Lysimeter
@CrimsonChin Feel free to upvote my answer as reward :P Upvoted comments dont get rep I'm afraid!Brezin
After getting the string to a variable I just used xml=xml.replace("<?xml version=\"1.0\" encoding=\"utf-8\"?>", ""); and the error was goneBuddhi
M
3

You may check if your xml file has BOM header

Misogyny answered 14/6, 2011 at 15:49 Comment(0)
B
3

I had the same problem while parsing XML generated by PHP. After I added the ContentType header "text/xml" it works like a charm.

Beaming answered 13/9, 2011 at 22:9 Comment(0)
D
2

as @StaxMan said, remove any unknown characters before

responseBody = responseBody.substring(responseBody.indexOf("<"));

Dorella answered 7/7, 2012 at 2:45 Comment(0)
L
1

this issue will be caused too by having the line < ?xml version="1.0" encoding="UTF-8"?> together with the xml data in the same line...

< ?xml version="1.0" encoding="UTF-8"?>< secciones>< seccion>< id>0< /id>< nombre>Portada< feedURL>http://iphone.elnorte.com/libre/online07/a ....

Lethargic answered 13/7, 2012 at 17:56 Comment(0)
T
0

You should have checked the encoding of the file instead of discarding the xml line.

I have found that my Eclipse (on Windows) had the same problem with a resource encoded as Unix-U8. After converting it to DOS-U8, the error went away.

Trygve answered 3/6, 2012 at 13:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.