How to handle namespaces with SAX Parser?
Asked Answered
U

2

9

I'm trying to learn to parse XML documents, I have a XML document that uses namespaces so, I'm sure I need to do something to parse correctly.

This is what I have:

DefaultHandler handler = new DefaultHandler() {

                boolean bfname = false;
                boolean blname = false;
                boolean bnname = false;
                boolean bsalary = false;

                public void startElement(String uri, String localName, String qName,
                        Attributes attributes) throws SAXException {

                    System.out.println("Start Element :" + qName);

                    if (qName.equalsIgnoreCase("FIRSTNAME")) {
                        bfname = true;
                    }

                    if (qName.equalsIgnoreCase("LASTNAME")) {
                        blname = true;
                    }

                    if (qName.equalsIgnoreCase("NICKNAME")) {
                        bnname = true;
                    }

                    if (qName.equalsIgnoreCase("SALARY")) {
                        bsalary = true;
                    }

                }

                public void endElement(String uri, String localName,
                        String qName) throws SAXException {

                    System.out.println("End Element :" + qName);

                }

                public void characters(char ch[], int start, int length) throws SAXException {

                    if (bfname) {
                        System.out.println("First Name : " + new String(ch, start, length));
                        bfname = false;
                    }

                    if (blname) {
                        System.out.println("Last Name : " + new String(ch, start, length));
                        blname = false;
                    }

                    if (bnname) {
                        System.out.println("Nick Name : " + new String(ch, start, length));
                        bnname = false;
                    }

                    if (bsalary) {
                        System.out.println("Salary : " + new String(ch, start, length));
                        bsalary = false;
                    }

                }

            };

            saxParser.parse(file, handler);

My question is, how I can handle the namespase in this example?

Ulphiah answered 16/2, 2014 at 5:0 Comment(0)
P
4

In a namespace qualified XML document there are two components to a nodes name: namespace URI and local name (these are passed in as parameters to the startElement and endElement events). When you are checking for the presence of an element you should be matching on both these parameters. Currently your code would work for both documents below even though they are namespace qualified differently.

<foo xmlns="FOO">
    <bar>Hello World</bar>
</foo>

And

<foo xmlns="BAR">
    <bar>Hello World</bar>
</foo>

You are currently (and incorrectly) matching on the qName parameter. The problem with what you are doing is that the qName might change based on the prefix used to represent a namespace. The two documents below have the exact same namespace qualification. The local names and namespaces are the same, but their QNames are different.

<foo xmlns="FOO">
    <bar>Hello World</bar>
</foo>

And

<ns:foo xmlns:ns="FOO">
    <ns:bar>Hello World</ns:bar>
<ns:foo>
Puppet answered 17/2, 2014 at 11:30 Comment(0)
C
7

To elaborate on what Blaise's point with sample code, consider this contrived example:

<?xml version="1.0" encoding="UTF-8"?>
<!-- ns.xml -->
<root xmlns:foo="http://data" xmlns="http://data">
  <foo:record>ONE</foo:record>
  <bar:record xmlns:bar="http://data">TWO</bar:record>
  <record>THREE</record>
  <record xmlns="http://metadata">meta 1</record>
  <foo:record xmlns:foo="http://metadata">meta 2</foo:record>
</root>

There are two different types of record element. One in the http://data namespace; the other in http://metadata namespace. There are three data records and two metadata records.

The document could be normalized to this:

<?xml version="1.0" encoding="UTF-8"?>
<ns0:root xmlns:ns0="http://data" xmlns:ns1="http://metadata">
  <ns0:record>ONE</ns0:record>
  <ns0:record>TWO</ns0:record>
  <ns0:record>THREE</ns0:record>
  <ns1:record>meta 1</ns1:record>
  <ns1:record>meta 2</ns1:record>
</ns0:root>

But the code must handle the general case.

Here is some code for printing the metadata records:

class MetadataPrinter extends DefaultHandler {
  private boolean isMeta = false;

  @Override
  public void startElement(String uri, String localName, String qName,
      Attributes attributes) throws SAXException {
    isMeta = "http://metadata".equals(uri) && "record".equals(localName);
  }

  @Override
  public void endElement(String uri, String localName, String qName)
      throws SAXException {
    if (isMeta) {
      System.out.println();
      isMeta = false;
    }
  }

  @Override
  public void characters(char[] ch, int start, int length)
      throws SAXException {
    if (isMeta) {
      System.out.print(new String(ch, start, length));
    }
  }
}

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware(true);
SAXParser parser = factory.newSAXParser();
parser.parse(new File("ns.xml"), new MetadataPrinter());

Note: namespace awareness must be enabled explicitly in some of the older Java XML APIs (SAX and DOM among them.)

Cheeseburger answered 17/2, 2014 at 14:40 Comment(1)
Upvote for setNamespaceAware(true), that's the missing piece I was looking for!Exigible
P
4

In a namespace qualified XML document there are two components to a nodes name: namespace URI and local name (these are passed in as parameters to the startElement and endElement events). When you are checking for the presence of an element you should be matching on both these parameters. Currently your code would work for both documents below even though they are namespace qualified differently.

<foo xmlns="FOO">
    <bar>Hello World</bar>
</foo>

And

<foo xmlns="BAR">
    <bar>Hello World</bar>
</foo>

You are currently (and incorrectly) matching on the qName parameter. The problem with what you are doing is that the qName might change based on the prefix used to represent a namespace. The two documents below have the exact same namespace qualification. The local names and namespaces are the same, but their QNames are different.

<foo xmlns="FOO">
    <bar>Hello World</bar>
</foo>

And

<ns:foo xmlns:ns="FOO">
    <ns:bar>Hello World</ns:bar>
<ns:foo>
Puppet answered 17/2, 2014 at 11:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.