How to get element's value from XML using SAX parser in startElement?
Asked Answered
K

1

9

Is it possible to get the content of an element from a XML file in startElement function that is the override function of the SAX handler?

Below is the specification.

1) XML file

<employees>
   <employee id="111">
      <firstName>Rakesh</firstName>
      <lastName>Mishra</lastName>
      <location>Bangalore</location>
   </employee>
   <employee id="112">
      <firstName>John</firstName>
      <lastName>Davis</lastName>
      <location>Chennai</location>
   </employee>
   <employee id="113">
      <firstName>Rajesh</firstName>
      <lastName>Sharma</lastName>
      <location>Pune</location>
   </employee>
</employees>

2) startElement function

@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
    .......code in here..........
}

3) Expected result

element name   : employee
attribute name : id
attribute value: 111
firstName      : Rakesh
lastName       : Mishra
location       : Bangalore

element name   : employee
attribute name : id
attribute value: 112
firstName      : John
lastName       : Davis
location       : Chennai

element name   : employee
attribute name : id
attribute value: 113
firstName      : Rajesh
lastName       : Sharma
location       : Pune
Kerb answered 9/6, 2014 at 4:6 Comment(7)
mkyong.com/java/how-to-read-xml-file-in-java-sax-parserCyder
@PawanAryan, thank you. I already check this one. If I say I want only write code in startElement function, is it possible?Kerb
You only get attributes in startElement. Any text values you get in characters. You should use startElement to detect when an element started. Inside it you can set flags which you can check in the characters method. Knowing which is the current element inside characters, you can get its value. You must remember to reset those flags in endElement.Platonism
Using startElement() and other method is the only way you access data in XML. i dont think its possible to write every thing in startElement. SAX Parser is different than DOM because it doesn’t load complete XML into memory and read xml document sequentially. startElement() : Every time a SAX parser gets a opening tag '<', it calls startElement(). endElement(): Every time a SAX parser gets a closing tag '>', it calls endElement(). character(): Every time a SAX parser gets a simple character string, it calls character() method and the xml according to the code written in startElement().Cyder
@PawanAryan, thank you for your easy understand concept. how about this option? I want a set of tagName, attName, attValue, and tag's value. The reason I ask this because I need to use it in another thread.Kerb
Are you taking about passing parameter to this startElement() and other at runtime in some thread. Please elaborateCyder
Actually, I need a set of them for writing to specific file that's why ask that option. As I tried my code, I'm working with startElement() function for working two threads. First read from xml, and then write them to another file. The result I got is only this set(tagName, attName, attValue). Any idea please help?Kerb
P
14

You can get the element's name in startElement and endElement. You can also get attributes in startElement. Values you should get in characters.

Here is a very basic example on how to get the value of an element using a ContentHandler:

public class YourHandler extends DefaultHandler {

    boolean inFirstNameElement = false;

    public class startElement(....) {
        if(qName.equals("firstName") {
            inFirstNameElement = true;
        }
    }

    public class endElement(....) {
        if(qName.equals("firstName") {
            inFirstNameElement = false;
        }
    }

    public class characters(....) {
        if(inFirstNameElement) {
            // do something with the characters in the <firstName> element
        }
    }
}

If you have a simple example, setting boolean flags for each tag is OK. If you have a more complex scenario, you might prefer store the flags in a map using element names as keys, or even create one or more Employee classes mapped to your XML, instantiate them every time <employee> is found in startElement, populate its properties, and add it to a Collection in endElement.

Here is a complete ContentHandler example that works with your example file. I hope it helps you get started:

public class SimpleHandler extends DefaultHandler {

    class Employee {
        public String firstName;
        public String lastName;
        public String location;
        public Map<String, String> attributes = new HashMap<>();
    }
    boolean isFirstName, isLastName, isLocation;
    Employee currentEmployee;
    List<Employee> employees = new ArrayList<>();

    @Override
    public void startElement(String uri, String localName, String qName,
            Attributes atts) throws SAXException {
        if(qName.equals("employee")) {
            currentEmployee = new Employee();
            for(int i = 0; i < atts.getLength(); i++) {
                currentEmployee.attributes.put(atts.getQName(i),atts.getValue(i));
            }
        }
        if(qName.equals("firstName")) { isFirstName = true; }
        if(qName.equals("lastName"))  { isLastName = true;  }
        if(qName.equals("location"))  { isLocation = true;  }
    }

    @Override
    public void endElement(String uri, String localName, String qName)
            throws SAXException {
        if(qName.equals("employee")) {
            employees.add(currentEmployee);
            currentEmployee = null;
        }
        if(qName.equals("firstName")) { isFirstName = false; }
        if(qName.equals("lastName"))  { isLastName = false;  }
        if(qName.equals("location"))  { isLocation = false;  }
    }

    @Override
    public void characters(char[] ch, int start, int length) throws SAXException {
        if (isFirstName) {
            currentEmployee.firstName = new String(ch, start, length);
        }
        if (isLastName) {
            currentEmployee.lastName = new String(ch, start, length);
        }
        if (isLocation) {
            currentEmployee.location = new String(ch, start, length);
        }
    }

    @Override
    public void endDocument() throws SAXException {
        for(Employee e: employees) {
            System.out.println("Employee ID: " + e.attributes.get("id"));
            System.out.println("  First Name: " + e.firstName);
            System.out.println("  Last Name: " + e.lastName);
            System.out.println("  Location: " + e.location);
        }
    }
}
Platonism answered 9/6, 2014 at 4:30 Comment(6)
Excuse me!!! How about this option? In case we don't know the tagName or something, but we want to get tagName, attName, attValue, and tagValue at once. Is it possible?Kerb
As shown above, in the startElement method you can read the tag name (qName) and all attributes you can read from the atts variable (atts.getQName(i) and atts.getValue(i)), but to read the tag's text value you need to use the characters method and use flags as shown above. If you run the example above you should get the result you are expecting.Platonism
How about different xml file? Do we need to implement other codes? As I notice, your code use specific tagName in condition. If we don't know the specific one, what should we do?Kerb
You can simply print the tag name if you wish, and instead of setting flags based on the tag name, do that based on their relative positions (create a Map and store the names and contexts as you read them).Platonism
SAX is intended for sequential reading of an XML file, which is necessary when you need to extract bits of information from large files. If you want to obtain all the data at once by simply using methods to extract the data you wish, you might prefer to use an object model API, such as DOM or XPath.Platonism
Please note that "SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks". Thus you need to accumulate data.Salmonoid

© 2022 - 2024 — McMap. All rights reserved.