The markup must be well-formed
Asked Answered
W

3

8

First off, let me say I am a new to SAX and Java.

I am trying to read information from an XML file that is not well formed.

When I try to use the SAX or DOM Parser I get the following error in response:

The markup in the document following the root element must be well-formed.

This is how I set up my XML file:

<format type="filename" t="13241">0;W650;004;AG-Erzgeb</format>
<format type="driver" t="123412">001;023</format>
   ...

Can I force the SAX or DOM to parse XML files even if they are not well formed XML?

Thank you for your help. Much appreciated. Haythem

Wayfarer answered 23/3, 2010 at 11:19 Comment(1)
FYI: By definition... If it's not well formed it's not XML. en.wikipedia.org/wiki/XML#Well-formedness_and_error-handlingAshjian
E
20

Your best bet is to make the XML well-formed, probably by pre-processing it a bit. In this case, you can achieve that simply by putting an XML declaration on (and even that's optional) and providing a root element (which is not optional), like this:

<?xml version="1.0"?>
<wrapper>
    <format type="filename" t="13241">0;W650;004;AG-Erzgeb</format>
    <format type="driver" t="123412">001;023</format>
</wrapper>

There I've arbitrarily picked the name "wrapper" for the root element; it can be whatever you like.

Enterostomy answered 23/3, 2010 at 11:22 Comment(3)
I'd just like to add that you don't necessarily need to do that modification on the disk, but that you could do it on the fly by providing a filtering InputStream/Reader. Especially for big files (or reading XML from a URL) this can be very useful. A SequenceInputStream could be useful here: java.sun.com/javase/6/docs/api/java/io/SequenceInputStream.htmlDelanos
Good posibility. is not easier to trun out the parse?. can I turn out the parse() mehtode and overwrite it to ignore the non-well-formed status?Wayfarer
Haythem: probably not, because the parser is deep within the library and the behavior of such a browser would be undefined (the XML libraries don't know how to handle XML with more than one root element). Doing it this way instantly makes your XML well-formed and all XML-aware tools can suddenly handle it just fine (provided you have no other incorrect parts in there).Delanos
C
1

Hint: using sax or stax you can successfully parse a not well formed xml document until the FIRST "well formed-ness" error is encountered.

(I know that this is not of too much help...)

Crockett answered 23/3, 2010 at 11:39 Comment(0)
W
0

As the DOM will scan you xml file then build a tree, the root node of the tree is like the as 1 Answer. However, if the Parser can't find the or even , it can even build the tree. So, its better to do some pre-processing the xml file before parser it by DOM or Sax.

Windpollinated answered 23/3, 2010 at 11:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.