Android SAX parser not getting full text from between tags
Asked Answered
B

3

21

I've created my own DefaultHandler to parse rss feeds and for most feeds it's working fine, however, for ESPN, it is cutting off part of the article url due to the way ESPN formats it's urls. An example of a full article url from ESPN..

http://sports.espn.go.com/nba/news/story?id=5189101&campaign=rss&source=ESPNHeadlines

The problem is for some reason the DefaultHandler characters method is only getting this from the tag that contains the above url.

http://sports.espn.go.com/nba/news/story?id=5189101

As you can see, it's cutting everything off the url from the ampersand escape code and after. How can I get the SAX parser to not cut my string off at this escape code? For ref. here is my characters method..

 public void characters(char ch[], int start, int length) {

  String chars = (new String(ch).substring(start, start + length));

  try {
   // If not in item, then title/link refers to feed
   if (!inItem) {
    if (inTitle)
     currentFeed.title = chars;
   } else {
    if (inLink)
     currentArticle.url = new URL(chars);
    if (inTitle)
     currentArticle.title = chars;
    if (inDescription)
     currentArticle.description = chars;
    if (inPubDate)
     currentArticle.pubDate = chars;
    if (inEnclosure) {
    }
   }
  } catch (MalformedURLException e) {
   Log.e("RSSReader", e.toString());
  }
 }

Rob W.

Bremen answered 14/5, 2010 at 22:44 Comment(0)
M
46

As you can see, it's cutting everything off the url from the ampersand escape code and after.

From the documentation of the characters() method:

The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.

When I write SAX parsers, I use a StringBuilder to append everything passed to characters():

public void characters (char ch[], int start, int length) {
    if (buf!=null) {
        for (int i=start; i<start+length; i++) {
            buf.append(ch[i]);
        }
    }
}

Then in endElement(), I take the contents of the StringBuilder and do something with it. That way, if the parser calls characters() several times, I don't miss anything.

Mielke answered 14/5, 2010 at 23:56 Comment(10)
Ok, I didn't really take the time to fully understand how the parser was working. After reading your answer I went back and researched further to get a better understanding. Your suggestion was the problem of course, I've since updated my code to handle the char data properly. TYBremen
@CommonsWare: do it miss some characters? I am facing it in my case.Protecting
I have <image>image1:title</image> in my xml and sometime I get full value and sometimes I got only "itle" or "Title". I have tried to print values but it has never printed "image1:" for partial values.Protecting
@Ankit: Please open a fresh StackOverflow question, show your input, your parsing code, and your results.Mielke
With you solution my problem got resolved even then I will post it as question for future readers.Protecting
Thank you, your answers are always short, descriptive, provide actual reasoning behind the answer and of course on the spot!Chenoweth
@Mielke I am using SAX parser which contains the following text inside as tag as shown below <book id="1">Hi this book is selected for <ref id="23">IIFA</ref> award.</book> When I parse, and get the text from the tag book, I am getting the below content 'Hi this book is selected for IIFA award.' But I want this text 'Hi this book is selected for <ref id="23">IIFA</ref> award.' Why the <ref> is missing in the text, how to get that while parsing ?? Please let me knowEscarole
@KK_07k11A0585: That is a separate XML element. You are already getting it while parsing, in your startElement() and endElement() methods.Mielke
@Mielke Thanks, I have parsed that by adding that tag name in startElement and endElement(). But is there any other way to get the complete text inside the tag as plain text ?? In the above example, how can I get this text 'Hi this book is selected for <ref id="23">IIFA</ref>' as is from the tag book ??Escarole
@KK_07k11A0585: You would have to reassemble that yourself, using string concatenation. This has nothing to do with Android specifically. If you have further questions in this area, ask a fresh Stack Overflow question, tagged java, where you explain your input and what you are trying to achieve.Mielke
I
6
@Override
public void startElement(String uri, String localName, String qName,
        Attributes attributes) throws SAXException {
    // TODO Auto-generated method stub
    sb=new StringBuilder();
    if(localName.equals("icon"))
    {
        iconflag=true;
    }
}

@Override
public void characters (char ch[], int start, int length) {
    if (sb!=null && iconflag == true) {
        for (int i=start; i<start+length; i++) {
            sb.append(ch[i]);
        }
    }
}

@Override
public void endElement(String uri, String localName, String qName)
        throws SAXException {
    // TODO Auto-generated method stub
    if(iconflag)
    {
        info.setIcon(sb.toString().trim());
        iconflag=false;
    }
}

So I figured it out, the code above is the solution.

Ingamar answered 29/5, 2012 at 12:34 Comment(0)
B
0

I ran into this problem the other day, it turns out the reason for this is the CHaracters method is being called multiple times in case any of these Characters are contained in the Value:

"   &quot;
'   &apos;
<   &lt;
>   &gt;
&   &amp;

Also be careful about Linebreaks / newlines within the value!!! If the xml is linewrapped without your controll the characters method wil also be called for each line that is in the statement, plus it will return the linebreak! (which you manually need to strip out in turn).

A sample Handler taking care of all these problems is this one:

 DefaultHandler handler = new DefaultHandler() {
   private boolean isInANameTag = false;
   private String localname;
   private StringBuilder elementContent;

   @Override
   public void startElement(String uri, String localName,String qName, Attributes attributes) throws SAXException {
    if (qname.equalsIgnoreCase("myfield")) {
      isInMyTag = true;
      this.localname = localname;
      this.elementContent = new StringBuilder();
    }
   }

   public void characters(char[] buffer, int start, int length) {
      if (isInMyTag) {
         String content = new String(ch, start, length);
         if (StringUtils.equals(content.substring(0, 1), "\n")) {
              // remove leading newline
              elementContent.append(content.substring(1));
         } else {
              elementContent.append(content);
         }
      }
   }

   public void endElement(String uri, String localName, String qName) throws SAXException {
     if (qname.equalsIgnoreCase("myfield")) {
       isInMyTag = false;
       // do something with elementContent.toString());
       System.out.println(elementContent.toString());
       this.localname = "";
     }
   }
}

I hope this helps.

Boynton answered 24/10, 2019 at 13:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.