Unable to parse value containing special character? Using sax parser
Asked Answered
M

4

5

I am new to parsing field. I'm trying to write a parser code but unable to get the value with respect to a particular tag that value contains ampersand(&). Please help me to get the solution.

My xml file looks like

<system>
<u_id>10145</u_id>
<serial_no>1800015</serial_no>
<branch_name>B & P Infotech Ltd.</branch_name>
</system>

and I have tried with this java code, but it's not giving me proper output.

main class

package com.satya.xmltest;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

public class SaxTest {

    public static void main(String[] args) {
        SAXParserFactory parserFactory = SAXParserFactory.newInstance();
        SaxtestHandler handler=new SaxtestHandler();
        try {
            SAXParser parser = parserFactory.newSAXParser();
            parser.parse("C:\\Users\\abc\\Desktop\\test.xml", handler);
        } catch (Exception e) {
        }
        SystemTo systemTo=handler.systemTo;
        System.out.println("Uid :"+systemTo.getUid());
        System.out.println("serial number :"+systemTo.getSerialNumber());
        System.out.println("name :"+systemTo.getName());
    }
}

Handler class

In this class the parsing is done and setting the data values to data container class.

package com.satya.xmltest;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class SaxtestHandler extends DefaultHandler {
    String content = "";
    SystemTo systemTo=new SystemTo();

    @Override
    public void startElement(String uri, String localName, String qName,
        Attributes attributes) throws SAXException {

        switch (qName) {
            case "system":
                System.out.println("inside company");
                break;
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName)
        throws SAXException {
        switch (qName) {
            case "u_id":
                systemTo.setUid(content);
                break;
            case "serial_no":
                systemTo.setSerialNumber(content);
                break;
            case "branch_name":
                systemTo.setName(content);
                break;
        }
    }

    @Override
    public void characters(char[] ch, int start, int length)
        throws SAXException {
        content = String.copyValueOf(ch, start, length).trim();
    }
}

Data container class

package com.satya.xmltest;

public class SystemTo {

    private String uid;
    private String serialNumber;
    private String name;
    public String getUid() {
        return uid;
    }
    public void setUid(String uid) {
        this.uid = uid;
    }
    public String getSerialNumber() {
        return serialNumber;
    }
    public void setSerialNumber(String serialNumber) {
        this.serialNumber = serialNumber;
    }
    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
}

My output is:

Uid: 10145
serial number: 1800015
name: null

But I need:

Uid: 10145
serial number: 1800015
name: B & P Infotech Ltd.

Thanks in advance.

Mudra answered 29/1, 2014 at 11:47 Comment(2)
Plain: its not valid XML! Either escape the ampersand using escape syntax: &amp; or use a CDATA sectionFleam
@GyroGearless but i am getting this type of value in my response .Mudra
B
7

There are some characters in XML that must not appear in their literal form in an XML document, except when used as markup delimiters or within a comment, a processing instruction, or a CDATA section.
List of characters and their corresponding entity or the numeric reference to replace :

Original Character    XML entity replacement      XML numeric replacement

      "                     &quot;                       &#34;   
      <                     &lt;                         &#60;   
      >                     &gt;                         &#62;
      &                     &amp;                        &#38;
      '                     &apos;                       &#39;   

you must replace above character in XML before you parse it.

You may use CDATA Section for text that is not markup constitutes the character data of the document

Betteanne answered 29/1, 2014 at 12:3 Comment(0)
S
3

You can escape these chars like html does:

<branch_name>B &amp; P Infotech Ltd.</branch_name>

Or you have use of CDATA:

<branch_name><![CDATA[B & P Infotech Ltd.]]></branch_name>
Succumb answered 29/1, 2014 at 11:53 Comment(0)
W
0

You must replace your special characters with the characters that are accepted for an XML file. In your case & should be replaced by &amp;

@Override
public void characters(char[] ch, int start, int length)
        throws SAXException {
    content = String.copyValueOf(ch, start, length).trim();
    content = content.replace("&", "&amp;")
}
Woodcock answered 29/1, 2014 at 11:51 Comment(5)
It's not working.It's not providing me the desired output. can you please check it again.Mudra
Are you getting some other output? Or is the same?Woodcock
its giving me name= nullMudra
i think you should debug your code and check whether do you get you name as null in endElement method or content as null in characters methodWoodcock
this doesnt work, as the special characters are not passed to the characters method in the first place (string is cut at the special char)Thighbone
C
0

The problem is that the "&" is an escape character it self.

To fix this you need to replace the ampersand with a unicode equivalent, i.e: "&#038;"

Czarist answered 29/1, 2014 at 11:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.