Is there a way to parse XML via SAX/DOM with line numbers available per node
Asked Answered
H

2

8

I already have written a DOM parser for a large XML document format that contains a number of items that can be used to automatically generate Java code. This is limited to small expressions that are then merged into a dynamically generated Java source file.

So far - so good. Everything works.

BUT - I wish to be able to embed the line number of the XML node where the Java code was included from (so that if the configuration contains uncompilable code, each method will have a pointer to the source XML document and the line number for ease of debugging). I don't require the line number at parse-time and I don't need to validate the XML Source Document and throw an error at a particular line number. I need to be able to access the line number for each node and attribute in my DOM or per SAX event.

Any suggestions on how I might be able to achieve this?

P.S. Also, I read the StAX has a method to obtain line number whilst parsing, but ideally I would like to achieve the same result with regular SAX/DOM processing in Java 4/5 rather than become a Java 6+ application or take on extra .jar files.

Hughs answered 9/5, 2010 at 16:46 Comment(3)
Perhaps with org.xml.sax.Locator ?Amadou
Great, I'll check it out. I asked this question as I actually seem to have consumed some misinformation that claimed this was not possible in the default SAX processor of 1.4/5.0. I'll report back here with if I have success.Hughs
Thanks, exactly what I was searching for.Sputter
N
11

I know this thread is a little old (sorry), but it has taken me so long to crack this nut I had to share the solution with someone...

You only seem to be able to obtain the line numbers with SAX which doesn't build a DOM. The DOM parser does not give the line numbers, and neither does it let you near the SAX parser it is using. My solution is to do an empty XSLT transformation using a SAX source and a DOM result, but even then someone has done their best to hide this. See the code below.

I add the location information to each element as an attribute with my own namespace, so I can find elements using XPath and report where the data came from.

Hope this helps:

// The file to parse.
String systemId = "myxml.xml";

/*
 * Create transformer SAX source that adds current element position to
 * the element as attributes.
 */
XMLReader xmlReader = XMLReaderFactory.createXMLReader();
LocationFilter locationFilter = new LocationFilter(xmlReader);

InputSource inputSource = new InputSource(new FileReader(systemId));
// Do this so that XPath function document() can take relative URI.
inputSource.setSystemId(systemId);
SAXSource saxSource = new SAXSource(locationFilter, inputSource);

/*
 * Perform an empty transformation from SAX source to DOM result.
 */
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMResult domResult = new DOMResult();
transformer.transform(saxSource, domResult);
Node root = domResult.getNode();

...
class LocationFilter extends XMLFilterImpl {

    LocationFilter(XMLReader xmlReader) {
        super(xmlReader);
    }

    private Locator locator = null;

    @Override
    public void setDocumentLocator(Locator locator) {
        super.setDocumentLocator(locator);
        this.locator = locator;
    }

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {

        // Add extra attribute to elements to hold location
        String location = locator.getSystemId() + ':' + locator.getLineNumber() + ':' + locator.getColumnNumber();
        Attributes2Impl attrs = new Attributes2Impl(attributes);
        attrs.addAttribute("http://myNamespace", "location", "myns:location", "CDATA", location);
        super.startElement(uri, localName, qName, attrs);
    }
}
Neese answered 10/1, 2011 at 17:33 Comment(2)
absolutely essential, clear and concise. I knew this had to existLocular
According to the doc, locator.getLineNumber() returns the end of the element, what to do if the start line number is needed?Morven
M
1

I ran into this issue recently and I thought I'd share a ready made utility class for handling it. Works with Java 11, whereas some of Reg Whitton's code uses some now deprecated classes.

Mostly based on this article with a few tweaks. Notably, storing the line number as a the node's user data rather than setting it as an attribute.

import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayDeque;
import java.util.Deque;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.xml.sax.Attributes;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class XmlDom {

    public static Document readXML(InputStream is, final String lineNumAttribName) throws IOException, SAXException {
        final Document doc;
        SAXParser parser;
        try {
            SAXParserFactory factory = SAXParserFactory.newInstance();
            parser = factory.newSAXParser();
            DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
            doc = docBuilder.newDocument();           
        } catch(ParserConfigurationException e){
            throw new RuntimeException("Can't create SAX parser / DOM builder.", e);
        }

        final Deque<Element> elementStack = new ArrayDeque<>();
        final StringBuilder textBuffer = new StringBuilder();
        DefaultHandler handler = new DefaultHandler() {
            private Locator locator;

            @Override
            public void setDocumentLocator(Locator locator) {
                this.locator = locator; //Save the locator, so that it can be used later for line tracking when traversing nodes.
            }

            @Override
            public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {               
                addTextIfNeeded();
                Element el = doc.createElement(qName);
                for(int i = 0;i < attributes.getLength(); i++)
                    el.setAttribute(attributes.getQName(i), attributes.getValue(i));
                el.setUserData(lineNumAttribName, String.valueOf(locator.getLineNumber()), null);
                elementStack.push(el);               
            }

            @Override
            public void endElement(String uri, String localName, String qName){
                addTextIfNeeded();
                Element closedEl = elementStack.pop();
                if (elementStack.isEmpty()) { // Is this the root element?
                    doc.appendChild(closedEl);
                } else {
                    Element parentEl = elementStack.peek();
                    parentEl.appendChild(closedEl);                   
                }
            }

            @Override
            public void characters (char ch[], int start, int length) throws SAXException {
                textBuffer.append(ch, start, length);
            }

            // Outputs text accumulated under the current node
            private void addTextIfNeeded() {
                if (textBuffer.length() > 0) {
                    Element el = elementStack.peek();
                    Node textNode = doc.createTextNode(textBuffer.toString());
                    el.appendChild(textNode);
                    textBuffer.delete(0, textBuffer.length());
                }
            }           
        };
        parser.parse(is, handler);

        return doc;
    }   

}

Access the line number with

node.getUserData(lineNumAttribName);
Mallis answered 9/8, 2019 at 9:26 Comment(1)
According to the doc, locator.getLineNumber() returns the end of the element, what to do if the start line number is needed?Morven

© 2022 - 2024 — McMap. All rights reserved.