Pretty print XML in java 8
Asked Answered
H

7

32

I have an XML file stored as a DOM Document and I would like to pretty print it to the console, preferably without using an external library. I am aware that this question has been asked multiple times on this site, however none of the previous answers have worked for me. I am using java 8, so perhaps this is where my code differs from previous questions? I have also tried to set the transformer manually using code found from the web, however this just caused a not found error.

Here is my code which currently just outputs each xml element on a new line to the left of the console.

import java.io.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;


public class Test {
    public Test(){
        try {
            //java.lang.System.setProperty("javax.xml.transform.TransformerFactory", "org.apache.xalan.xsltc.trax.TransformerFactoryImpl");

            DocumentBuilderFactory dbFactory;
            DocumentBuilder dBuilder;
            Document original = null;
            try {
                dbFactory = DocumentBuilderFactory.newInstance();
                dBuilder = dbFactory.newDocumentBuilder();
                original = dBuilder.parse(new InputSource(new InputStreamReader(new FileInputStream("xml Store - Copy.xml"))));
            } catch (SAXException | IOException | ParserConfigurationException e) {
                e.printStackTrace();
            }
            StringWriter stringWriter = new StringWriter();
            StreamResult xmlOutput = new StreamResult(stringWriter);
            TransformerFactory tf = TransformerFactory.newInstance();
            //tf.setAttribute("indent-number", 2);
            Transformer transformer = tf.newTransformer();
            transformer.setOutputProperty(OutputKeys.METHOD, "xml");
            transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
            transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
            transformer.transform(new DOMSource(original), xmlOutput);
            java.lang.System.out.println(xmlOutput.getWriter().toString());
        } catch (Exception ex) {
            throw new RuntimeException("Error converting to String", ex);
        }
    }

    public static void main(String[] args){
        new Test();
    }

}
Herbivorous answered 16/9, 2014 at 8:45 Comment(0)
C
10

I guess that the problem is related to blank text nodes (i.e. text nodes with only whitespaces) in the original file. You should try to programmatically remove them just after the parsing, using the following code. If you don't remove them, the Transformer is going to preserve them.

original.getDocumentElement().normalize();
XPathExpression xpath = XPathFactory.newInstance().newXPath().compile("//text()[normalize-space(.) = '']");
NodeList blankTextNodes = (NodeList) xpath.evaluate(original, XPathConstants.NODESET);

for (int i = 0; i < blankTextNodes.getLength(); i++) {
     blankTextNodes.item(i).getParentNode().removeChild(blankTextNodes.item(i));
}
Catchpenny answered 16/9, 2014 at 10:19 Comment(0)
R
57

In reply to Espinosa's comment, here is a solution when "the original xml is not already (partially) indented or contain new lines".

Background

Excerpt from the article (see References below) inspiring this solution:

Based on the DOM specification, whitespaces outside the tags are perfectly valid and they are properly preserved. To remove them, we can use XPath’s normalize-space to locate all the whitespace nodes and remove them first.

Java Code

public static String toPrettyString(String xml, int indent) {
    try {
        // Turn xml string into a document
        Document document = DocumentBuilderFactory.newInstance()
                .newDocumentBuilder()
                .parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8"))));

        // Remove whitespaces outside tags
        document.normalize();
        XPath xPath = XPathFactory.newInstance().newXPath();
        NodeList nodeList = (NodeList) xPath.evaluate("//text()[normalize-space()='']",
                                                      document,
                                                      XPathConstants.NODESET);

        for (int i = 0; i < nodeList.getLength(); ++i) {
            Node node = nodeList.item(i);
            node.getParentNode().removeChild(node);
        }

        // Setup pretty print options
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        transformerFactory.setAttribute("indent-number", indent);
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");

        // Return pretty print xml string
        StringWriter stringWriter = new StringWriter();
        transformer.transform(new DOMSource(document), new StreamResult(stringWriter));
        return stringWriter.toString();
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

Sample usage

String xml = "<root>" + //
             "\n   "  + //
             "\n<name>Coco Puff</name>" + //
             "\n        <total>10</total>    </root>";

System.out.println(toPrettyString(xml, 4));

Output

<root>
    <name>Coco Puff</name>
    <total>10</total>
</root>

References

Rotten answered 5/11, 2015 at 10:12 Comment(14)
This is actually pretty similar to the code which I ended up using :).Herbivorous
@btrs20 The difference relies in the whitespaces removal.Rotten
I ended up doing something similar, simple recursion looking for white space only text nodes, no xpath. Yours code shorter. Nice example of advanced XPath. Thanks.Rupture
If this works perfect. But if you have some exceptions regarding the lack of indend-number attribute the solution will be to check the classpath for classes implementing TransformerFactory. I had in classpath the library net.sf.saxon:Saxon-HE that defined an additional TransformerFactory.Islamite
Removal of the whitespace is important. The transformer doesn't work if your String has whitespace between lines.Byelaw
Note that this does not play well with XHTML DOCTYPE declarations (it tries to fetch them); once removed, this solution works very well. Also note that because of other imports I had to use org.w3c.dom.Document and org.w3c.dom.Node explicitly instead of Document and Node, and instead of the ByteArrayInputStream you can use InputSource inputSource = new InputSource(new StringReader(code)); (passing in inputSource to DocumentBuilder.parse()).Immigration
could someone resolve the problem with the intend-number not interpreted? also the XML declaration is on the same line than the root element (don't want to omit it).Kookaburra
@Kookaburra can you please post your problem in a new question?Rotten
@Rotten ehm no. i don't see this to be a different topic. feel free to create your own questions. the intend is surely part of what most developers expect to be a "pretty" print. none of the solutions posted here have solved the intend yet.Kookaburra
@Kookaburra Im sorry, the "intend-number not interpreted" problem is totally unclear. Further comments won't explicit it.Rotten
@Rotten i am using jdk8u162. executing the above code with transformerFactory.setAttribute("indent-number", indent); simply does not add any indent to the output of the method. i expect to see spaces at the beginning of any inner xml-tag. seems like this is ignored.Kookaburra
@Kookaburra I wasn't able to reproduce your issue. Please detail it in a new question and feel free to post a link back here in a comment.Rotten
I like this answer except that it adds a line break at the end.Jacques
@Jacques You may try underscore-java library and U.formatXml(xml) method.Adamantine
C
10

I guess that the problem is related to blank text nodes (i.e. text nodes with only whitespaces) in the original file. You should try to programmatically remove them just after the parsing, using the following code. If you don't remove them, the Transformer is going to preserve them.

original.getDocumentElement().normalize();
XPathExpression xpath = XPathFactory.newInstance().newXPath().compile("//text()[normalize-space(.) = '']");
NodeList blankTextNodes = (NodeList) xpath.evaluate(original, XPathConstants.NODESET);

for (int i = 0; i < blankTextNodes.getLength(); i++) {
     blankTextNodes.item(i).getParentNode().removeChild(blankTextNodes.item(i));
}
Catchpenny answered 16/9, 2014 at 10:19 Comment(0)
B
6

This works on Java 8:

public static void main (String[] args) throws Exception {
    String xmlString = "<hello><from>ME</from></hello>";
    DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
    Document document = documentBuilder.parse(new InputSource(new StringReader(xmlString)));
    pretty(document, System.out, 2);
}

private static void pretty(Document document, OutputStream outputStream, int indent) throws Exception {
    TransformerFactory transformerFactory = TransformerFactory.newInstance();
    Transformer transformer = transformerFactory.newTransformer();
    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    if (indent > 0) {
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", Integer.toString(indent));
    }
    Result result = new StreamResult(outputStream);
    Source source = new DOMSource(document);
    transformer.transform(source, result);
}
Bookkeeping answered 16/9, 2014 at 9:33 Comment(3)
Hmmm, that also works for me so I guess the problem must be in the way I read the xml file.Herbivorous
Warning, this solution only works when in the original xml is not already (partially) indented or contain new lines. That is, it will work for "<hello><from>ME</from></hello>" but NOT for "<hello>\n<from>ME</from>\n</hello>"Rupture
To casual readers, here is a solution for @Espinosa's warning: https://mcmap.net/q/75268/-pretty-print-xml-in-java-8Rotten
P
2

I've written a simple class for for removing whitespace in documents - supports command-line and does not use DOM / XPath.

Edit: Come to think of it, the project also contains a pretty-printer which handles existing whitespace:

PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().ignoreWhitespace().build();
Psychotechnology answered 28/12, 2014 at 22:59 Comment(0)
A
1

Underscore-java has static method U.formatXml(string). I am the maintainer of the project. Live example

import com.github.underscore.U;

public class MyClass {
    public static void main(String args[]) {
        String xml = "<root>" + //
             "\n   "  + //
             "\n<name>Coco Puff</name>" + //
             "\n        <total>10</total>    </root>";

        System.out.println(U.formatXml(xml));
    }
}

Output:

<root>
   <name>Coco Puff</name>
   <total>10</total>
</root>
Adamantine answered 8/12, 2018 at 5:30 Comment(0)
I
0

I didn't like any of the common XML formatting solutions because they all remove more than 1 consecutive new line character (for some reason, removing spaces/tabs and removing new line characters are inseparable...). Here's my solution, which was actually made for XHTML but should do the job with XML as well:

public String GenerateTabs(int tabLevel) {
  char[] tabs = new char[tabLevel * 2];
  Arrays.fill(tabs, ' ');

  //Or:
  //char[] tabs = new char[tabLevel];
  //Arrays.fill(tabs, '\t');

  return new String(tabs);
}

public String FormatXHTMLCode(String code) {
  // Split on new lines.
  String[] splitLines = code.split("\\n", 0);

  int tabLevel = 0;

  // Go through each line.
  for (int lineNum = 0; lineNum < splitLines.length; ++lineNum) {
    String currentLine = splitLines[lineNum];

    if (currentLine.trim().isEmpty()) {
      splitLines[lineNum] = "";
    } else if (currentLine.matches(".*<[^/!][^<>]+?(?<!/)>?")) {
      splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];

      ++tabLevel;
    } else if (currentLine.matches(".*</[^<>]+?>")) {
      --tabLevel;

      if (tabLevel < 0) {
        tabLevel = 0;
      }

      splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
    } else if (currentLine.matches("[^<>]*?/>")) {
      splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];

      --tabLevel;

      if (tabLevel < 0) {
        tabLevel = 0;
      }
    } else {
      splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
    }
  }

  return String.join("\n", splitLines);
}

It makes one assumption: that there are no <> characters except for those that comprise the XML/XHTML tags.

Immigration answered 22/3, 2017 at 17:1 Comment(2)
this snippet is incomplete, since the codeGenerator variable cannot be resolved. is the corresponding class written in java? since java method names do have a different naming convention.Kookaburra
@Kookaburra Sorry about that, and thanks for informing me. I didn't realize there was external code being utilized. Try that, I think it will work; can't test it right now.Immigration
B
-3

Create xml file :

new FileInputStream("xml Store - Copy.xml") ;// result xml file format incorrect ! 

so that, when parse the content of the given input source as an XML document and return a new DOM object.

Document original = null;
...
original.parse("data.xml");//input source as an XML document
Brimful answered 16/9, 2014 at 9:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.