Get line number from xml node - java
Asked Answered
D

4

35

I have parsed an XML file and have gotten a Node that I am interested in. How can I now find the line number in the source XML file where this node occurs?

EDIT: Currently I am using the SAXParser to parse my XML. However I will be happy with a solution using any parser.

Along with the Node, I also have the XPath expression for the node.

I need to get the line number because I am displaying the XML file in a textbox, and need to highlight the line where the node occured. Assume that the XML file is nicely formatted with sufficient line breaks.

Dam answered 6/2, 2011 at 19:5 Comment(0)
D
30

I have got this working by following this example:

http://eyalsch.wordpress.com/2010/11/30/xml-dom-2/

This solution follows the method suggested by Michael Kay. Here is how you use it:

// XmlTest.java

import java.io.ByteArrayInputStream;
import java.io.InputStream;

import org.w3c.dom.Document;
import org.w3c.dom.Node;

public class XmlTest {
    public static void main(final String[] args) throws Exception {

        String xmlString = "<foo>\n"
                         + "    <bar>\n"
                         + "        <moo>Hello World!</moo>\n"
                         + "    </bar>\n"
                         + "</foo>";

        InputStream is = new ByteArrayInputStream(xmlString.getBytes());
        Document doc = PositionalXMLReader.readXML(is);
        is.close();

        Node node = doc.getElementsByTagName("moo").item(0);

        System.out.println("Line number: " + node.getUserData("lineNumber"));
    }
}

If you run this program, it will out put: "Line number: 3"

PositionalXMLReader is a slightly modified version of the example linked above.

// PositionalXMLReader.java

import java.io.IOException;
import java.io.InputStream;
import java.util.Stack;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.xml.sax.Attributes;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class PositionalXMLReader {
    final static String LINE_NUMBER_KEY_NAME = "lineNumber";

    public static Document readXML(final InputStream is) throws IOException, SAXException {
        final Document doc;
        SAXParser parser;
        try {
            final SAXParserFactory factory = SAXParserFactory.newInstance();
            parser = factory.newSAXParser();
            final DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
            final DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
            doc = docBuilder.newDocument();
        } catch (final ParserConfigurationException e) {
            throw new RuntimeException("Can't create SAX parser / DOM builder.", e);
        }

        final Stack<Element> elementStack = new Stack<Element>();
        final StringBuilder textBuffer = new StringBuilder();
        final DefaultHandler handler = new DefaultHandler() {
            private Locator locator;

            @Override
            public void setDocumentLocator(final Locator locator) {
                this.locator = locator; // Save the locator, so that it can be used later for line tracking when traversing nodes.
            }

            @Override
            public void startElement(final String uri, final String localName, final String qName, final Attributes attributes)
                    throws SAXException {
                addTextIfNeeded();
                final Element el = doc.createElement(qName);
                for (int i = 0; i < attributes.getLength(); i++) {
                    el.setAttribute(attributes.getQName(i), attributes.getValue(i));
                }
                el.setUserData(LINE_NUMBER_KEY_NAME, String.valueOf(this.locator.getLineNumber()), null);
                elementStack.push(el);
            }

            @Override
            public void endElement(final String uri, final String localName, final String qName) {
                addTextIfNeeded();
                final Element closedEl = elementStack.pop();
                if (elementStack.isEmpty()) { // Is this the root element?
                    doc.appendChild(closedEl);
                } else {
                    final Element parentEl = elementStack.peek();
                    parentEl.appendChild(closedEl);
                }
            }

            @Override
            public void characters(final char ch[], final int start, final int length) throws SAXException {
                textBuffer.append(ch, start, length);
            }

            // Outputs text accumulated under the current node
            private void addTextIfNeeded() {
                if (textBuffer.length() > 0) {
                    final Element el = elementStack.peek();
                    final Node textNode = doc.createTextNode(textBuffer.toString());
                    el.appendChild(textNode);
                    textBuffer.delete(0, textBuffer.length());
                }
            }
        };
        parser.parse(is, handler);

        return doc;
    }
}
Dam answered 7/2, 2011 at 4:31 Comment(1)
note that this solution only notices elements, and ignores comments and possibly also CDATA and DTD. You can get those by implementing LexicalHandler and calling setProperty as instructed by the javadoc.Priapus
F
9

If you are using a SAX parser then the line number of an event can be obtained using the Locator object, which is notified to the ContentHandler via the setDocumentLocator() callback. This is called at the start of parsing, and you need to save the Locator; then after any event (such as startElement()), you can call methods such as getLineNumber() to obtain the current position in the source file. (After startElement(), the callback is defined to give you the line number on which the ">" of the start tag appears.)

Frontality answered 6/2, 2011 at 21:20 Comment(4)
hello, can I configurate the saxon XSLT processor (any version) that it use this as an specific xml parser? I only found the parameter -x to use a own SAX parser.Petes
Saxon has a configuration option -l or FeatureKeys.LINE_NUMBERING that will cause it to collect line number information supplied by the XML parser and retain it in the constructed tree. It is then accessible using the saxon:line-number() extension function.Frontality
thanks for the answer. i know the saxon:line-number function. i'm sorry i wasn't precisely enough! The answer of priomsrb triggerd me to modifie his PositionalXMLReader to add more user data to the nodes. I found the saxon:getUserData function (just for the versions < 7.4?) and was wondering whether i could use that to get more information about the nodes directly into XSLT. (e. g. the last row/column number of the node.)Petes
I'd suggest giving a more detailed description of what you are trying to do on the forum at saxonica.plan.io. It seems a bit too complex to handle in a comment thread here.Frontality
D
1

Note that according to the spec (of Locator.getLineNumber()) the method returns the line number where the SAX-event ends!

In the case of "startElement()" this means:

Here the line number for Element is 1:

<Element></Element>

Here the line number for Element is 3:

<Element
   attribute1="X"
   attribute2="Y">
</Element>
Desiccant answered 14/9, 2017 at 17:17 Comment(2)
Hello @hhaehle. Welcome to SO. This is some helpful information but it probably should be put in a comment since it does not answer the original question. You can learn more about comments here.Edmonton
So the line number is the end line number, but is there any way to get the start line (where the event starts)?Anaptyxis
B
0

priomsrb's answer is great and works. For my usecase i need to integrate it to an existing framework where e.g. the encoding is also covered. Therefore the following refactoring was applied to have a separate LineNumberHandler class.

Then the code will also work with a Sax InputSource where the encoding can be modified like this:

            // read in the xml document
            org.xml.sax.InputSource is=new org.xml.sax.InputSource();
            is.setByteStream(instream);
            if (encoding!=null) {
                is.setEncoding(encoding);
                if (Debug.CORE)
                    Debug.log("setting XML encoding to - "+is.getEncoding());
            }   

Separate LineNumberHandler

/**
 * LineNumber Handler
 * @author wf
 *
 */
public static class LineNumberHandler extends DefaultHandler {

final Stack<Element> elementStack = new Stack<Element>();
final StringBuilder textBuffer = new StringBuilder();
private Locator locator;
private Document doc;

/**
 * create a line number Handler for the given document
 * @param doc
 */
public LineNumberHandler(Document doc) {
  this.doc=doc;
}

@Override
public void setDocumentLocator(final Locator locator) {
  this.locator = locator; // Save the locator, so that it can be used
                          // later for line tracking when traversing
                          // nodes.
}

@Override
public void startElement(final String uri, final String localName,
    final String qName, final Attributes attributes) throws SAXException {
  addTextIfNeeded();
  final Element el = doc.createElement(qName);
  for (int i = 0; i < attributes.getLength(); i++) {
    el.setAttribute(attributes.getQName(i), attributes.getValue(i));
  }
  el.setUserData(LINE_NUMBER_KEY_NAME,
      String.valueOf(this.locator.getLineNumber()), null);
  elementStack.push(el);
}

@Override
public void endElement(final String uri, final String localName,
    final String qName) {
  addTextIfNeeded();
  final Element closedEl = elementStack.pop();
  if (elementStack.isEmpty()) { // Is this the root element?
    doc.appendChild(closedEl);
  } else {
    final Element parentEl = elementStack.peek();
    parentEl.appendChild(closedEl);
  }
}

@Override
public void characters(final char ch[], final int start, final int length)
    throws SAXException {
  textBuffer.append(ch, start, length);
}

// Outputs text accumulated under the current node
private void addTextIfNeeded() {
  if (textBuffer.length() > 0) {
    final Element el = elementStack.peek();
    final Node textNode = doc.createTextNode(textBuffer.toString());
    el.appendChild(textNode);
    textBuffer.delete(0, textBuffer.length());
  }
}

};

PositionalXMLReader

public class PositionalXMLReader {
  final static String LINE_NUMBER_KEY_NAME = "lineNumber";
 /**
  * read a document from the given input strem
  * 
  * @param is
  *          - the input stream
  * @return - the Document
  * @throws IOException
  * @throws SAXException
  */
public static Document readXML(final InputStream is)
  throws IOException, SAXException {
  final Document doc;
  SAXParser parser;
    try {
      final SAXParserFactory factory = SAXParserFactory.newInstance();
      parser = factory.newSAXParser();
      final DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
      .newInstance();
      final DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
      doc = docBuilder.newDocument();
    } catch (final ParserConfigurationException e) {
      throw new RuntimeException("Can't create SAX parser / DOM builder.", e);
    }
    LineNumberHandler handler = new LineNumberHandler(doc);
    parser.parse(is, handler);

    return doc;
  }
}

JUnit Testcase

package com.bitplan.common.impl;

import static org.junit.Assert.assertEquals;

import java.io.ByteArrayInputStream;
import java.io.InputStream;

import org.junit.Test;
import org.w3c.dom.Document;
import org.w3c.dom.Node;

import com.bitplan.bobase.PositionalXMLReader;

public class TestXMLWithLineNumbers {

  /**
   * get an Example XML Stream
   * @return the example stream
   */
  public InputStream getExampleXMLStream() {
    String xmlString = "<foo>\n" + "    <bar>\n"
        + "        <moo>Hello World!</moo>\n" + "    </bar>\n" + "</foo>";

    InputStream is = new ByteArrayInputStream(xmlString.getBytes());
    return is;
  }

  @Test
  public void testXMLWithLineNumbers() throws Exception {
    InputStream is = this.getExampleXMLStream();
    Document doc = PositionalXMLReader.readXML(is);
    is.close();

    Node node = doc.getElementsByTagName("moo").item(0);
    assertEquals("3", node.getUserData("lineNumber"));
  }  
}
Bossism answered 23/7, 2019 at 9:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.