Android org.xmlpull.v1.XmlPullParserException while parsing XML
Asked Answered
I

1

7

I have a situation where i call a web service and it returns me some HTML in an XML envelop. like:

<xml version="1.0" cache="false">
<head/>
<body>
<table>
<tr>
   <td>
        <a href="link-to-prev-post">
           <text color="red"><< Prev</text>
        </a>
   </td>
   <td>
        <a href="link-to-next-post">
           <text color="red">| Next >></text>
        </a>
   </td>
</tr>
</table>
</body>
</xml>

I have to retrieve the link-to-prev-post & link-to-next-post links.. so i can get more data through these links.

I am using XmlPullParser to parse the above provided XML/HTML. To get the links for next/prev items, i am doing as follows:

if (xmlNodeName.equalsIgnoreCase("a")) {
                link = parser.getAttributeValue(null, "href");

            } else if (xmlNodeName.equalsIgnoreCase("text")) {
                color = parser.getAttributeValue(null, "color");

                if (color.equalsIgnoreCase("red") && parser.getEventType() == XmlPullParser.START_TAG) {
                        // check for next/prev blog entries links
                        // but this parser.nextText() throws XmlPullParserException
                        // i think because the nextText() returns << Prev which the parser considers to be wrong
                        String innerText = parser.nextText();
                        if (innerText.contains("<< Prev")) {
                            blog.setPrevBlogItemsUrl(link);                             
                        } else if (innerText.contains("Next >>")) {
                            blog.setNextBlogItemsUrl(link);
                        }
                    }

                    link = null;
                }
            }

It throws XmlPullParserException on execution of parser.nextText() ... and the value of the text element at this time is << Prev .. i think it misunderstands this value with start tag because of the presence of << in text..

LogCat detail is:

04-08 18:32:09.827: W/System.err(688): org.xmlpull.v1.XmlPullParserException: precondition: START_TAG (position:END_TAG </text>@9:2535 in java.io.InputStreamReader@44c6d0d8) 
04-08 18:32:09.827: W/System.err(688):  at org.kxml2.io.KXmlParser.exception(KXmlParser.java:245)
04-08 18:32:09.827: W/System.err(688):  at org.kxml2.io.KXmlParser.nextText(KXmlParser.java:1382)
04-08 18:32:09.827: W/System.err(688):  at utilities.XMLParserHelper.parseBlogEntries(XMLParserHelper.java:139)
04-08 18:32:09.827: W/System.err(688):  at serviceclients.PlayerSummaryAsyncTask.doInBackground(PlayerSummaryAsyncTask.java:68)
04-08 18:32:09.827: W/System.err(688):  at serviceclients.PlayerSummaryAsyncTask.doInBackground(PlayerSummaryAsyncTask.java:1)
04-08 18:32:09.836: W/System.err(688):  at android.os.AsyncTask$2.call(AsyncTask.java:185)
04-08 18:32:09.836: W/System.err(688):  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:305)
04-08 18:32:09.836: W/System.err(688):  at java.util.concurrent.FutureTask.run(FutureTask.java:137)
04-08 18:32:09.836: W/System.err(688):  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1068)
04-08 18:32:09.836: W/System.err(688):  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:561)
04-08 18:32:09.836: W/System.err(688):  at java.lang.Thread.run(Thread.java:1096)

I hope i have clarified my problem.

Solution

Isnpired by Martin's approach of converting the received data first to string, i managed my problem in a kind of mixed approach.

  1. Convert the received InputStream's value to string and replaced the erroneous characters with * (or whatever you wish) : as follows

    InputStreamReader isr = new InputStreamReader(serviceReturnedStream);
    
    BufferedReader br = new BufferedReader(isr);
    StringBuilder xmlAsString = new StringBuilder(512);
    String line;
    try {
        while ((line = br.readLine()) != null) {
            xmlAsString.append(line.replace("<<", "*").replace(">>", "*"));
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
    
  2. Now i have a string which contains correct XML data (for my case), so just use the normal XmlPullParser to parse it instead of manually parsing it myself:

    XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
    
    factory.setNamespaceAware(false);
    
    XmlPullParser parser = factory.newPullParser();
    parser.setInput(new StringReader(xmlAsString.toString()));
    

Hope this helps someone!

Incus answered 8/4, 2012 at 14:6 Comment(0)
S
6

Yes, the exception is probably thrown because that is invalid XML as per section 2.4 Character Data and Markup in the XML 1.0 specification:

[...] the left angle bracket (<) MUST NOT appear in [its] literal form, [...]

If you put that XML in Eclipse, Eclipse will complain about the XML being invalid. If you are able to fix the web service, you should fix the generated XML, either by using entity references such as &lt; or by using CDATA.

If you have no power over the web service, I think the easiest will be to parse that manually with some custom code, perhaps using regular expressions, depending on how relaxed requirements of generality you have.

Example Code

Here's how you could parse the XML file above. Note that you probably want to improve this code to make it more general, but you should have something to start with at least:

    // Read the XML into a StringBuilder so we can get get a Matcher for the
    // whole XML
    InputStream xmlResponseInputStream = // Get InputStream to XML somehow
    InputStreamReader isr = new InputStreamReader(xmlResponseInputStream);
    BufferedReader br = new BufferedReader(isr);
    StringBuilder xmlAsString = new StringBuilder(512);
    String line;
    try {
        while ((line = br.readLine()) != null) {
            xmlAsString.append(line);
        }
    } catch (IOException e) {
        e.printStackTrace();
    }

    // Look for links using a regex. Assume the first link is "Prev" and the
    // next link is "Next"
    Pattern hrefRegex = Pattern.compile("<a href=\"([^\"]*)\">");
    Matcher m = hrefRegex.matcher(xmlAsString);
    String linkToPrevPost = null;
    String linkToNextPost = null;
    while (m.find()) {
        String hrefValue = m.group(1);
        if (linkToPrevPost == null) {
            linkToPrevPost = hrefValue;
        } else {
            linkToNextPost = hrefValue;
        }
    }

    Log.i("Example", "'Prev' link = " + linkToPrevPost + 
            " 'Next' link = " + linkToNextPost);

With your XML file, the output to logcat will be

I/Example (12399): 'Prev' link = link-to-prev-post 'Next' link = link-to-next-post
Seto answered 11/4, 2012 at 11:40 Comment(6)
thanks for the explanation ... actually i have no control over the web service so i cannot change whats returned... using regular expressions sounds good but the issue arises when i try to read the data using parser.nextText() ... so i think regex cannot be used as well bcoz i will have to first get the text before parsing the it through regex .. but if u think it can be done then can u please provide me some sample example?? that would be great.Incus
I'm glad to help! I was actually referring to parsing the entire XML manually, i.e. not using the XML parser at all (since it's not valid XML you are parsing).Seto
ok i umnderstand now.. but how would u propose such manual parsing? i am looking for an example..as i m badly stuckIncus
Thank u very much ... now your answer includes all the possibilities for solving my issue .. i will try parsing it manually as u suggested .. I am accepting it as answer as there cannot be any other more "easy" way to do it ... thanks alotIncus
Aamir, if you really want to, you can use a StringReader to feed the fixed xml back to the parserGifford
+1 to u sergio... I am sorry for being late ... i have done it the same way... i updated my question to include its solution so someone else can benefit.Incus

© 2022 - 2024 — McMap. All rights reserved.