How to pretty print XML from Java?
Asked Answered
H

34

502

I have a Java String that contains XML, with no line feeds or indentations. I would like to turn it into a String with nicely formatted XML. How do I do this?

String unformattedXml = "<tag><nested>hello</nested></tag>";
String formattedXml = new [UnknownClass]().format(unformattedXml);

Note: My input is a String. My output is a String.

(Basic) mock result:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <tag>
    <nested>hello</nested>
  </tag>
</root>
Hellcat answered 26/9, 2008 at 12:21 Comment(4)
check this question: #1265349Hydro
Just curious, are you sending this output to a XML file or something else where the indenting really matters? Some time ago I was very concerned about formatting my XML in order to have it properly displayed... but after spending a bunch of time on this I realized that I had to send my output to a web browser, and any relatively modern web browser will actually display the XML in a nice tree structure, so I could forget about this issue and move on. I'm mentioning this just in case you (or other user with the same problem) could have overlooked the same detail.Ducktail
@Abel, saving to text files, inserting into an HTML textareas, and dumping to the console for debugging purposes.Hellcat
"put on hold as too broad" - it is hard to be more precise than the question currently is!Hellcat
H
107

Now it's 2012 and Java can do more than it used to with XML, I'd like to add an alternative to my accepted answer. This has no dependencies outside of Java 6.

import org.w3c.dom.Node;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;
import org.xml.sax.InputSource;

import javax.xml.parsers.DocumentBuilderFactory;
import java.io.StringReader;

/**
 * Pretty-prints xml, supplied as a string.
 * <p/>
 * eg.
 * <code>
 * String formattedXml = new XmlFormatter().format("<tag><nested>hello</nested></tag>");
 * </code>
 */
public class XmlFormatter {

    public String format(String xml) {

        try {
            final InputSource src = new InputSource(new StringReader(xml));
            final Node document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src).getDocumentElement();
            final Boolean keepDeclaration = Boolean.valueOf(xml.startsWith("<?xml"));

        //May need this: System.setProperty(DOMImplementationRegistry.PROPERTY,"com.sun.org.apache.xerces.internal.dom.DOMImplementationSourceImpl");


            final DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
            final DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
            final LSSerializer writer = impl.createLSSerializer();

            writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE); // Set this to true if the output needs to be beautified.
            writer.getDomConfig().setParameter("xml-declaration", keepDeclaration); // Set this to true if the declaration is needed to be outputted.

            return writer.writeToString(document);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(String[] args) {
        String unformattedXml =
                "<?xml version=\"1.0\" encoding=\"UTF-8\"?><QueryMessage\n" +
                        "        xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n" +
                        "        xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n" +
                        "    <Query>\n" +
                        "        <query:CategorySchemeWhere>\n" +
                        "   \t\t\t\t\t         <query:AgencyID>ECB\n\n\n\n</query:AgencyID>\n" +
                        "        </query:CategorySchemeWhere>\n" +
                        "    </Query>\n\n\n\n\n" +
                        "</QueryMessage>";

        System.out.println(new XmlFormatter().format(unformattedXml));
    }
}
Hellcat answered 17/7, 2012 at 9:29 Comment(13)
No indention, but it does works with this: System.setProperty(DOMImplementationRegistry.PROPERTY,"com.sun.org.apache.xerces.internal.dom.DOMImplementationSourceImpl");Staircase
I tried this solution and found that the returned XML string always had UTF-16. When the document is serialized back into a string, there is this line of code: ser._format.setEncoding("UTF-16"); This may be some the standard now, but for the system I'm working with, UTF-8 is used. Does anyone know how to maintain the encoding from the original XML string?Waxwork
@DanTemple It seems like you need to use LSOutput to control the encoding. See chipkillmar.net/2009/03/25/pretty-print-xml-from-a-domOleoresin
@JoshuaDavis Thanks, that works to set the encoding on the response. I'll need to add something that passes the original encoding to the pretty printer if I want to maintain the encoding of the oiginal XML string.Waxwork
@JoshA. There is a one-liner using XMLBeam: System.out.println(new XBProjector().projectXMLString("<xml><foo><bar/></foo></xml>", DOMAccess.class).asString());Roily
I find that this wraps lines over a certain length - is there any way to prevent wrapping?Dichroscope
this adds an FEFF BOM at the beginning of the returned String.Chagall
I tried to use this in Andriod but i am not able to find the package `DOMImplementationRegistry. I am using java 8.Antisepsis
I had issues with this snippet regarding escaped attribute values. The less than escape sequence &lt; was correctly preserved while the greater than sequence &gt; was converted to the actual sign >.Assimilative
yep, this work on jdk8 without exception instead of next rated answer.Kenneth
To set the encoding: LSOutput output = impl.createLSOutput(); output.setEncoding("UTF-8"); output.setByteStream(new ByteArrayOutputStream()); writer.write(document, output); return output.getByteStream().toString();Middling
Didnt work on JDK 8 deployed to JBoss EAP 7/Wildfly 10 get class not found exceptions com.sun.org.apache.xerces.internal.dom.DOMXSImplementationSourceImpl - looks related to this issues.jboss.org/browse/WFLY-4416 - but I'm not going to deal with adding an extra library or anything - I dont need it pretty printed that badly.Jenn
thanks for including the import list as well, so many conflicting packages available to make sense of the combination needed otherwise..Cage
G
308
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
// initialize StreamResult with File object to save to file
StreamResult result = new StreamResult(new StringWriter());
DOMSource source = new DOMSource(doc);
transformer.transform(source, result);
String xmlString = result.getWriter().toString();
System.out.println(xmlString);

Note: Results may vary depending on the Java version. Search for workarounds specific to your platform.

Gertiegertrud answered 26/9, 2008 at 12:26 Comment(5)
How to make so that the output wont contain <?xml version="1.0" encoding="UTF-8"?>?Gripe
To omit <?xml ...> declaration, add transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes")Motto
Casual readers may find useful an improved version of the solution described here (https://mcmap.net/q/75268/-pretty-print-xml-in-java-8).Tham
This solution removes CDATA wrappers. Is there a flag to prevent it?Unheard
Stephan's comment is important for those who have already indented document. Original answer works for such document only in Java 8. In Java 11 or higher it appends whitespace to present whitespaces and the indentation breaks up.Degust
H
145

a simpler solution based on this answer:

public static String prettyFormat(String input, int indent) {
    try {
        Source xmlInput = new StreamSource(new StringReader(input));
        StringWriter stringWriter = new StringWriter();
        StreamResult xmlOutput = new StreamResult(stringWriter);
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        transformerFactory.setAttribute("indent-number", indent);
        transformerFactory.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "");
        transformerFactory.setAttribute(XMLConstants.ACCESS_EXTERNAL_STYLESHEET, "");
        Transformer transformer = transformerFactory.newTransformer(); 
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.transform(xmlInput, xmlOutput);
        return xmlOutput.getWriter().toString();
    } catch (Exception e) {
        throw new RuntimeException(e); // simple exception handling, please review it
    }
}

public static String prettyFormat(String input) {
    return prettyFormat(input, 2);
}

testcase:

prettyFormat("<root><child>aaa</child><child/></root>");

returns:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <child>aaa</child>
  <child/>
</root>

//Ignore: Original edit just needs missing s in the Class name in code. redundant six characters added to get over 6 characters validation on SO

Hydro answered 12/8, 2009 at 8:22 Comment(8)
This is the code I've always used but at this company it didn't work, I assume they are using another XML transforming library. I created the factory as a separate line and then did factory.setAttribute("indent-number", 4); and now it works.Celerity
How to make so that the output wont contain <?xml version="1.0" encoding="UTF-8"?>?Gripe
@Harry: transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");Splasher
Hi I am using this exact code, and mine formats properly with the exception of the first element So, this: <?xml version="1.0" encoding="UTF-8"?><root> is all on one line. Any ideas why?Virgina
@dfa: I liked the comment // simple exception handling, please review it. Can you point to some resources that recommend this type of exception handling? Thanks.Penuchle
@Codemiester: Seems to be a bug (see https://mcmap.net/q/75269/-since-moving-to-java-1-7-xml-document-element-does-not-indent). Adding transformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, "yes"); worked for me.Masurium
In Java 1.8 it does nothing (excess whitespace is not removed, no indent/new lines addedSegovia
This code snippet is vulnerable to XML eXternal Entity Injection (XXE). See: cheatsheetseries.owasp.org/cheatsheets/…Primrose
H
141

Here's an answer to my own question. I combined the answers from the various results to write a class that pretty prints XML.

No guarantees on how it responds with invalid XML or large documents.

package ecb.sdw.pretty;

import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import java.io.Writer;

/**
 * Pretty-prints xml, supplied as a string.
 * <p/>
 * eg.
 * <code>
 * String formattedXml = new XmlFormatter().format("<tag><nested>hello</nested></tag>");
 * </code>
 */
public class XmlFormatter {

    public XmlFormatter() {
    }

    public String format(String unformattedXml) {
        try {
            final Document document = parseXmlFile(unformattedXml);

            OutputFormat format = new OutputFormat(document);
            format.setLineWidth(65);
            format.setIndenting(true);
            format.setIndent(2);
            Writer out = new StringWriter();
            XMLSerializer serializer = new XMLSerializer(out, format);
            serializer.serialize(document);

            return out.toString();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    private Document parseXmlFile(String in) {
        try {
            DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
            DocumentBuilder db = dbf.newDocumentBuilder();
            InputSource is = new InputSource(new StringReader(in));
            return db.parse(is);
        } catch (ParserConfigurationException e) {
            throw new RuntimeException(e);
        } catch (SAXException e) {
            throw new RuntimeException(e);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(String[] args) {
        String unformattedXml =
                "<?xml version=\"1.0\" encoding=\"UTF-8\"?><QueryMessage\n" +
                        "        xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n" +
                        "        xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n" +
                        "    <Query>\n" +
                        "        <query:CategorySchemeWhere>\n" +
                        "   \t\t\t\t\t         <query:AgencyID>ECB\n\n\n\n</query:AgencyID>\n" +
                        "        </query:CategorySchemeWhere>\n" +
                        "    </Query>\n\n\n\n\n" +
                        "</QueryMessage>";

        System.out.println(new XmlFormatter().format(unformattedXml));
    }

}
Hellcat answered 26/9, 2008 at 13:13 Comment(8)
Just to note that this answer requires the use of Xerces. If you don't want to add this dependency then you can simply use the standard jdk libraries and javax.xml.transform.Transformer (see my answer below)Hokku
Back in 2008 this was a good answer, but now this can all be done with standard JDK classes rather than Apache classes. See xerces.apache.org/xerces2-j/faq-general.html#faq-6. Yes this is a Xerces FAQ but the answer covers standard JDK classes. The initial 1.5 implementation of these classes had many issues but everything works fine from 1.6 on. Copy the LSSerializer example in the FAQ, chop the "..." bit and add writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE); after the LSSerializer writer = ... line.Chibchan
I've created a small class using the example Apache gave, which @GeorgeHawkins gave a link to. It was missing how the variable document was initialized, so I thought I might add in the deceleration and make a quick example out of it. Let me know if I should change something, pastebin.com/XL7932aCArable
it is not true that you can do that with jdk only. at least not reliably. it depends on some internal registry implementation that is not active with my jdk7u72 by default. so you still better use the apache stuff directly.Cruet
Here is a solution without any dependencies :https://mcmap.net/q/75268/-pretty-print-xml-in-java-8.Tham
I'm actually maintaining a 2008-year project LOL, thanks!Weidar
probblems with ut-8Rehabilitation
To avoid the validation of the xml and the related overhead, use builder.setEntityResolver((publicId, systemId) -> {return new InputSource(new StringReader(""));});Vespine
H
107

Now it's 2012 and Java can do more than it used to with XML, I'd like to add an alternative to my accepted answer. This has no dependencies outside of Java 6.

import org.w3c.dom.Node;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;
import org.xml.sax.InputSource;

import javax.xml.parsers.DocumentBuilderFactory;
import java.io.StringReader;

/**
 * Pretty-prints xml, supplied as a string.
 * <p/>
 * eg.
 * <code>
 * String formattedXml = new XmlFormatter().format("<tag><nested>hello</nested></tag>");
 * </code>
 */
public class XmlFormatter {

    public String format(String xml) {

        try {
            final InputSource src = new InputSource(new StringReader(xml));
            final Node document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src).getDocumentElement();
            final Boolean keepDeclaration = Boolean.valueOf(xml.startsWith("<?xml"));

        //May need this: System.setProperty(DOMImplementationRegistry.PROPERTY,"com.sun.org.apache.xerces.internal.dom.DOMImplementationSourceImpl");


            final DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
            final DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
            final LSSerializer writer = impl.createLSSerializer();

            writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE); // Set this to true if the output needs to be beautified.
            writer.getDomConfig().setParameter("xml-declaration", keepDeclaration); // Set this to true if the declaration is needed to be outputted.

            return writer.writeToString(document);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(String[] args) {
        String unformattedXml =
                "<?xml version=\"1.0\" encoding=\"UTF-8\"?><QueryMessage\n" +
                        "        xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n" +
                        "        xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n" +
                        "    <Query>\n" +
                        "        <query:CategorySchemeWhere>\n" +
                        "   \t\t\t\t\t         <query:AgencyID>ECB\n\n\n\n</query:AgencyID>\n" +
                        "        </query:CategorySchemeWhere>\n" +
                        "    </Query>\n\n\n\n\n" +
                        "</QueryMessage>";

        System.out.println(new XmlFormatter().format(unformattedXml));
    }
}
Hellcat answered 17/7, 2012 at 9:29 Comment(13)
No indention, but it does works with this: System.setProperty(DOMImplementationRegistry.PROPERTY,"com.sun.org.apache.xerces.internal.dom.DOMImplementationSourceImpl");Staircase
I tried this solution and found that the returned XML string always had UTF-16. When the document is serialized back into a string, there is this line of code: ser._format.setEncoding("UTF-16"); This may be some the standard now, but for the system I'm working with, UTF-8 is used. Does anyone know how to maintain the encoding from the original XML string?Waxwork
@DanTemple It seems like you need to use LSOutput to control the encoding. See chipkillmar.net/2009/03/25/pretty-print-xml-from-a-domOleoresin
@JoshuaDavis Thanks, that works to set the encoding on the response. I'll need to add something that passes the original encoding to the pretty printer if I want to maintain the encoding of the oiginal XML string.Waxwork
@JoshA. There is a one-liner using XMLBeam: System.out.println(new XBProjector().projectXMLString("<xml><foo><bar/></foo></xml>", DOMAccess.class).asString());Roily
I find that this wraps lines over a certain length - is there any way to prevent wrapping?Dichroscope
this adds an FEFF BOM at the beginning of the returned String.Chagall
I tried to use this in Andriod but i am not able to find the package `DOMImplementationRegistry. I am using java 8.Antisepsis
I had issues with this snippet regarding escaped attribute values. The less than escape sequence &lt; was correctly preserved while the greater than sequence &gt; was converted to the actual sign >.Assimilative
yep, this work on jdk8 without exception instead of next rated answer.Kenneth
To set the encoding: LSOutput output = impl.createLSOutput(); output.setEncoding("UTF-8"); output.setByteStream(new ByteArrayOutputStream()); writer.write(document, output); return output.getByteStream().toString();Middling
Didnt work on JDK 8 deployed to JBoss EAP 7/Wildfly 10 get class not found exceptions com.sun.org.apache.xerces.internal.dom.DOMXSImplementationSourceImpl - looks related to this issues.jboss.org/browse/WFLY-4416 - but I'm not going to deal with adding an extra library or anything - I dont need it pretty printed that badly.Jenn
thanks for including the import list as well, so many conflicting packages available to make sense of the combination needed otherwise..Cage
H
54

Just to note that top rated answer requires the use of xerces.

If you don't want to add this external dependency then you can simply use the standard jdk libraries (which actually are built using xerces internally).

N.B. There was a bug with jdk version 1.5 see https://bugs.java.com/bugdatabase/view_bug?bug_id=6296446 but it is resolved now.,

(Note if an error occurs this will return the original text)

package com.test;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;

import javax.xml.transform.OutputKeys;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.sax.SAXTransformerFactory;
import javax.xml.transform.stream.StreamResult;

import org.xml.sax.InputSource;

public class XmlTest {
    public static void main(String[] args) {
        XmlTest t = new XmlTest();
        System.out.println(t.formatXml("<a><b><c/><d>text D</d><e value='0'/></b></a>"));
    }

    public String formatXml(String xml){
        try{
            Transformer serializer= SAXTransformerFactory.newInstance().newTransformer();
            serializer.setOutputProperty(OutputKeys.INDENT, "yes");
            //serializer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
            serializer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
            //serializer.setOutputProperty("{http://xml.customer.org/xslt}indent-amount", "2");
            Source xmlSource=new SAXSource(new InputSource(new ByteArrayInputStream(xml.getBytes())));
            StreamResult res =  new StreamResult(new ByteArrayOutputStream());            
            serializer.transform(xmlSource, res);
            return new String(((ByteArrayOutputStream)res.getOutputStream()).toByteArray());
        }catch(Exception e){
            //TODO log error
            return xml;
        }
    }

}
Hokku answered 17/12, 2010 at 16:27 Comment(5)
In this case left tabs are not used. All tags begin at first symbol of the line, like usual text.Masakomasan
don't you need to specify a charset when converting back and forth between bytes and string?Openhanded
There should be no need to convert from and to byte arrays/String. At the very least you would have to specify charset when doing so. Better option would be to use StringReader and StringWriter classes wrapped in InputSource and StreamResult.Slusher
not working. you need to mess around with some internal registry implementation.Cruet
Here is a simpler variant of this solution: https://mcmap.net/q/75268/-pretty-print-xml-in-java-8Tham
W
33

I've pretty printed in the past using the org.dom4j.io.OutputFormat.createPrettyPrint() method

public String prettyPrint(final String xml){  

    if (StringUtils.isBlank(xml)) {
        throw new RuntimeException("xml was null or blank in prettyPrint()");
    }

    final StringWriter sw;

    try {
        final OutputFormat format = OutputFormat.createPrettyPrint();
        final org.dom4j.Document document = DocumentHelper.parseText(xml);
        sw = new StringWriter();
        final XMLWriter writer = new XMLWriter(sw, format);
        writer.write(document);
    }
    catch (Exception e) {
        throw new RuntimeException("Error pretty printing xml:\n" + xml, e);
    }
    return sw.toString();
}
Wetterhorn answered 3/11, 2008 at 23:22 Comment(2)
The accepted solution does not properly indent the nested tags in my case, this one does.Knockabout
I used this in conjuction with removing all trailing spaces at the end of lines: prettyPrintedString.replaceAll("\\s+\n", "\n")Vial
E
19

Here's a way of doing it using dom4j:

Imports:

import org.dom4j.Document;  
import org.dom4j.DocumentHelper;  
import org.dom4j.io.OutputFormat;  
import org.dom4j.io.XMLWriter;

Code:

String xml = "<your xml='here'/>";  
Document doc = DocumentHelper.parseText(xml);  
StringWriter sw = new StringWriter();  
OutputFormat format = OutputFormat.createPrettyPrint();  
XMLWriter xw = new XMLWriter(sw, format);  
xw.write(doc);  
String result = sw.toString();
Exteroceptor answered 8/4, 2010 at 10:29 Comment(2)
This didnt work for me. It just gave something like: <?xml version... on one line and everything else on another line.Hards
add format.setSuppressDeclaration(true); to remove the <?xml version.. tagIntransigence
R
16

Since you are starting with a String, you can convert to a DOM object (e.g. Node) before you use the Transformer. However, if you know your XML string is valid, and you don't want to incur the memory overhead of parsing a string into a DOM, then running a transform over the DOM to get a string back - you could just do some old fashioned character by character parsing. Insert a newline and spaces after every </...> characters, keep and indent counter (to determine the number of spaces) that you increment for every <...> and decrement for every </...> you see.

Disclaimer - I did a cut/paste/text edit of the functions below, so they may not compile as is.

public static final Element createDOM(String strXML) 
    throws ParserConfigurationException, SAXException, IOException {

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setValidating(true);
    DocumentBuilder db = dbf.newDocumentBuilder();
    InputSource sourceXML = new InputSource(new StringReader(strXML));
    Document xmlDoc = db.parse(sourceXML);
    Element e = xmlDoc.getDocumentElement();
    e.normalize();
    return e;
}

public static final void prettyPrint(Node xml, OutputStream out)
    throws TransformerConfigurationException, TransformerFactoryConfigurationError, TransformerException {
    Transformer tf = TransformerFactory.newInstance().newTransformer();
    tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    tf.setOutputProperty(OutputKeys.INDENT, "yes");
    tf.transform(new DOMSource(xml), new StreamResult(out));
}
Romeo answered 26/9, 2008 at 12:58 Comment(2)
"However, if you know your XML string is valid ..." good point. See my solution based on this approach below.Benzvi
This answer is incorrect. It is simply not true that you need to convert to a DOM. Other answers show how to do a direct transformation from a StreamSource to a StreamResult; these solutions achieve the desired effect without the overhead of building a DOM in memory.Asta
B
15

Kevin Hakanson said: "However, if you know your XML string is valid, and you don't want to incur the memory overhead of parsing a string into a DOM, then running a transform over the DOM to get a string back - you could just do some old fashioned character by character parsing. Insert a newline and spaces after every characters, keep and indent counter (to determine the number of spaces) that you increment for every <...> and decrement for every you see."

Agreed. Such an approach is much faster and has far fewer dependencies.

Example solution:

/**
 * XML utils, including formatting.
 */
public class XmlUtils
{
  private static XmlFormatter formatter = new XmlFormatter(2, 80);

  public static String formatXml(String s)
  {
    return formatter.format(s, 0);
  }

  public static String formatXml(String s, int initialIndent)
  {
    return formatter.format(s, initialIndent);
  }

  private static class XmlFormatter
  {
    private int indentNumChars;
    private int lineLength;
    private boolean singleLine;

    public XmlFormatter(int indentNumChars, int lineLength)
    {
      this.indentNumChars = indentNumChars;
      this.lineLength = lineLength;
    }

    public synchronized String format(String s, int initialIndent)
    {
      int indent = initialIndent;
      StringBuilder sb = new StringBuilder();
      for (int i = 0; i < s.length(); i++)
      {
        char currentChar = s.charAt(i);
        if (currentChar == '<')
        {
          char nextChar = s.charAt(i + 1);
          if (nextChar == '/')
            indent -= indentNumChars;
          if (!singleLine)   // Don't indent before closing element if we're creating opening and closing elements on a single line.
            sb.append(buildWhitespace(indent));
          if (nextChar != '?' && nextChar != '!' && nextChar != '/')
            indent += indentNumChars;
          singleLine = false;  // Reset flag.
        }
        sb.append(currentChar);
        if (currentChar == '>')
        {
          if (s.charAt(i - 1) == '/')
          {
            indent -= indentNumChars;
            sb.append("\n");
          }
          else
          {
            int nextStartElementPos = s.indexOf('<', i);
            if (nextStartElementPos > i + 1)
            {
              String textBetweenElements = s.substring(i + 1, nextStartElementPos);

              // If the space between elements is solely newlines, let them through to preserve additional newlines in source document.
              if (textBetweenElements.replaceAll("\n", "").length() == 0)
              {
                sb.append(textBetweenElements + "\n");
              }
              // Put tags and text on a single line if the text is short.
              else if (textBetweenElements.length() <= lineLength * 0.5)
              {
                sb.append(textBetweenElements);
                singleLine = true;
              }
              // For larger amounts of text, wrap lines to a maximum line length.
              else
              {
                sb.append("\n" + lineWrap(textBetweenElements, lineLength, indent, null) + "\n");
              }
              i = nextStartElementPos - 1;
            }
            else
            {
              sb.append("\n");
            }
          }
        }
      }
      return sb.toString();
    }
  }

  private static String buildWhitespace(int numChars)
  {
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < numChars; i++)
      sb.append(" ");
    return sb.toString();
  }

  /**
   * Wraps the supplied text to the specified line length.
   * @lineLength the maximum length of each line in the returned string (not including indent if specified).
   * @indent optional number of whitespace characters to prepend to each line before the text.
   * @linePrefix optional string to append to the indent (before the text).
   * @returns the supplied text wrapped so that no line exceeds the specified line length + indent, optionally with
   * indent and prefix applied to each line.
   */
  private static String lineWrap(String s, int lineLength, Integer indent, String linePrefix)
  {
    if (s == null)
      return null;

    StringBuilder sb = new StringBuilder();
    int lineStartPos = 0;
    int lineEndPos;
    boolean firstLine = true;
    while(lineStartPos < s.length())
    {
      if (!firstLine)
        sb.append("\n");
      else
        firstLine = false;

      if (lineStartPos + lineLength > s.length())
        lineEndPos = s.length() - 1;
      else
      {
        lineEndPos = lineStartPos + lineLength - 1;
        while (lineEndPos > lineStartPos && (s.charAt(lineEndPos) != ' ' && s.charAt(lineEndPos) != '\t'))
          lineEndPos--;
      }
      sb.append(buildWhitespace(indent));
      if (linePrefix != null)
        sb.append(linePrefix);

      sb.append(s.substring(lineStartPos, lineEndPos + 1));
      lineStartPos = lineEndPos + 1;
    }
    return sb.toString();
  }

  // other utils removed for brevity
}
Benzvi answered 27/5, 2010 at 10:50 Comment(1)
This is the way it should be done. Format on the fly at the string level. This is the only solution that will format invalid or incomplete XML.Boothman
D
12

If using a 3rd party XML library is ok, you can get away with something significantly simpler than what the currently highest-voted answers suggest.

It was stated that both input and output should be Strings, so here's a utility method that does just that, implemented with the XOM library:

import nu.xom.*;
import java.io.*;

[...]

public static String format(String xml) throws ParsingException, IOException {
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    Serializer serializer = new Serializer(out);
    serializer.setIndent(4);  // or whatever you like
    serializer.write(new Builder().build(xml, ""));
    return out.toString("UTF-8");
}

I tested that it works, and the results do not depend on your JRE version or anything like that. To see how to customise the output format to your liking, take a look at the Serializer API.

This actually came out longer than I thought - some extra lines were needed because Serializer wants an OutputStream to write to. But note that there's very little code for actual XML twiddling here.

(This answer is part of my evaluation of XOM, which was suggested as one option in my question about the best Java XML library to replace dom4j. For the record, with dom4j you could achieve this with similar ease using XMLWriter and OutputFormat. Edit: ...as demonstrated in mlo55's answer.)

Dyun answered 7/6, 2009 at 16:26 Comment(1)
Thanks, that's what I was looking for. If you have an XML already parsed with XOM in a "Document" object, you can pass it directly to serializer.write(document);Emden
C
10

Regarding comment that "you must first build a DOM tree": No, you need not and should not do that.

Instead, create a StreamSource (new StreamSource(new StringReader(str)), and feed that to the identity transformer mentioned. That'll use SAX parser, and result will be much faster. Building an intermediate tree is pure overhead for this case. Otherwise the top-ranked answer is good.

Cadmar answered 26/2, 2009 at 19:54 Comment(1)
I whole-heartedly agree: building the intermediate DOM tree is a waste of memory. Thansk for that answer.Boothman
R
10

Hmmm... faced something like this and it is a known bug ... just add this OutputProperty ..

transformer.setOutputProperty(OutputPropertiesFactory.S_KEY_INDENT_AMOUNT, "8");

Hope this helps ...

Rolo answered 22/12, 2009 at 19:26 Comment(2)
Where does this OutputPropertiesFactory come from?Provincial
import com.sun.org.apache.xml.internal.serializer.*;Sham
B
9

Using scala:

import xml._
val xml = XML.loadString("<tag><nested>hello</nested></tag>")
val formatted = new PrettyPrinter(150, 2).format(xml)
println(formatted)

You can do this in Java too, if you depend on the scala-library.jar. It looks like this:

import scala.xml.*;

public class FormatXML {
    public static void main(String[] args) {
        String unformattedXml = "<tag><nested>hello</nested></tag>";
        PrettyPrinter pp = new PrettyPrinter(150, 3);
        String formatted = pp.format(XML.loadString(unformattedXml), TopScope$.MODULE$);
        System.out.println(formatted);
    }
}

The PrettyPrinter object is constructed with two ints, the first being max line length and the second being the indentation step.

Bornholm answered 8/3, 2011 at 1:58 Comment(0)
S
8

Just for future reference, here's a solution that worked for me (thanks to a comment that @George Hawkins posted in one of the answers):

DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
LSSerializer writer = impl.createLSSerializer();
writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE);
LSOutput output = impl.createLSOutput();
ByteArrayOutputStream out = new ByteArrayOutputStream();
output.setByteStream(out);
writer.write(document, output);
String xmlStr = new String(out.toByteArray());
Stevana answered 10/10, 2011 at 14:43 Comment(0)
T
8

slightly improved version from milosmns...

public static String getPrettyXml(String xml) {
    if (xml == null || xml.trim().length() == 0) return "";

    int stack = 0;
    StringBuilder pretty = new StringBuilder();
    String[] rows = xml.trim().replaceAll(">", ">\n").replaceAll("<", "\n<").split("\n");

    for (int i = 0; i < rows.length; i++) {
        if (rows[i] == null || rows[i].trim().length() == 0) continue;

        String row = rows[i].trim();
        if (row.startsWith("<?")) {
            pretty.append(row + "\n");
        } else if (row.startsWith("</")) {
            String indent = repeatString(--stack);
            pretty.append(indent + row + "\n");
        } else if (row.startsWith("<") && row.endsWith("/>") == false) {
            String indent = repeatString(stack++);
            pretty.append(indent + row + "\n");
            if (row.endsWith("]]>")) stack--;
        } else {
            String indent = repeatString(stack);
            pretty.append(indent + row + "\n");
        }
    }

    return pretty.toString().trim();
}

private static String repeatString(int stack) {
     StringBuilder indent = new StringBuilder();
     for (int i = 0; i < stack; i++) {
        indent.append(" ");
     }
     return indent.toString();
} 
Tibiotarsus answered 9/7, 2014 at 10:17 Comment(4)
where is repeatString(stack++); method..?Galvanize
private static String repeatString(int stack) { StringBuilder indent = new StringBuilder(); for (int i = 0; i < stack; i++){ indent.append(" "); } return indent.toString(); }Tibiotarsus
The indentation is not working fine at end tags.You need to change } else if (row.startsWith("</")) { part to this: else if (row.startsWith("</")) { String indent = repeatIdent(--stack); if (pretty.charAt(pretty.length() - 1) == '\n') { pretty.append(indent + row + "\n"); } else { pretty.append(row + "\n"); } }Drayman
Never attempt to parse XML by hand in this way. Your code will destroy the XML if, for example, it contains a "<" within a comment.Asta
S
6

All above solutions didn't work for me, then I found this http://myshittycode.com/2014/02/10/java-properly-indenting-xml-string/

The clue is remove whitespaces with XPath

    String xml = "<root>" +
             "\n   " +
             "\n<name>Coco Puff</name>" +
             "\n        <total>10</total>    </root>";

try {
    Document document = DocumentBuilderFactory.newInstance()
            .newDocumentBuilder()
            .parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8"))));

    XPath xPath = XPathFactory.newInstance().newXPath();
    NodeList nodeList = (NodeList) xPath.evaluate("//text()[normalize-space()='']",
                                                  document,
                                                  XPathConstants.NODESET);

    for (int i = 0; i < nodeList.getLength(); ++i) {
        Node node = nodeList.item(i);
        node.getParentNode().removeChild(node);
    }

    Transformer transformer = TransformerFactory.newInstance().newTransformer();
    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    transformer.setOutputProperty(OutputKeys.INDENT, "yes");
    transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");

    StringWriter stringWriter = new StringWriter();
    StreamResult streamResult = new StreamResult(stringWriter);

    transformer.transform(new DOMSource(document), streamResult);

    System.out.println(stringWriter.toString());
}
catch (Exception e) {
    e.printStackTrace();
}
Spruce answered 13/5, 2015 at 11:27 Comment(2)
Note that the use of the '{xml.apache.org/xslt}indent-amount' property will tie you to a specific transformer implementation.Solus
From all the solutions this one worked the best. I had spaces and new lines already in my XML plus I did not wanted to add more dependencies to my project. I wish I did not have to parse the XML but oh well.Wetmore
K
6

This code below working perfectly

import javax.xml.transform.OutputKeys;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

String formattedXml1 = prettyFormat("<root><child>aaa</child><child/></root>");

public static String prettyFormat(String input) {
    return prettyFormat(input, "2");
}

public static String prettyFormat(String input, String indent) {
    Source xmlInput = new StreamSource(new StringReader(input));
    StringWriter stringWriter = new StringWriter();
    try {
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", indent);
        transformer.transform(xmlInput, new StreamResult(stringWriter));

        String pretty = stringWriter.toString();
        pretty = pretty.replace("\r\n", "\n");
        return pretty;              
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}
Kendrickkendricks answered 2/6, 2016 at 18:31 Comment(0)
Y
6

I mix all of them and writing one small program. It is reading from the xml file and printing out. Just Instead of xzy give your file path.

    public static void main(String[] args) throws Exception {
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setValidating(false);
    DocumentBuilder db = dbf.newDocumentBuilder();
    Document doc = db.parse(new FileInputStream(new File("C:/Users/xyz.xml")));
    prettyPrint(doc);

}

private static String prettyPrint(Document document)
        throws TransformerException {
    TransformerFactory transformerFactory = TransformerFactory
            .newInstance();
    Transformer transformer = transformerFactory.newTransformer();
    transformer.setOutputProperty(OutputKeys.INDENT, "yes");
    transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
    DOMSource source = new DOMSource(document);
    StringWriter strWriter = new StringWriter();
    StreamResult result = new StreamResult(strWriter);transformer.transform(source, result);
    System.out.println(strWriter.getBuffer().toString());

    return strWriter.getBuffer().toString();

}
Yet answered 10/7, 2018 at 19:47 Comment(1)
Building a DOM here is unnecessary and inefficient. Use a StreamSource.Asta
T
5

If you're sure that you have a valid XML, this one is simple, and avoids XML DOM trees. Maybe has some bugs, do comment if you see anything

public String prettyPrint(String xml) {
            if (xml == null || xml.trim().length() == 0) return "";

            int stack = 0;
            StringBuilder pretty = new StringBuilder();
            String[] rows = xml.trim().replaceAll(">", ">\n").replaceAll("<", "\n<").split("\n");

            for (int i = 0; i < rows.length; i++) {
                    if (rows[i] == null || rows[i].trim().length() == 0) continue;

                    String row = rows[i].trim();
                    if (row.startsWith("<?")) {
                            // xml version tag
                            pretty.append(row + "\n");
                    } else if (row.startsWith("</")) {
                            // closing tag
                            String indent = repeatString("    ", --stack);
                            pretty.append(indent + row + "\n");
                    } else if (row.startsWith("<")) {
                            // starting tag
                            String indent = repeatString("    ", stack++);
                            pretty.append(indent + row + "\n");
                    } else {
                            // tag data
                            String indent = repeatString("    ", stack);
                            pretty.append(indent + row + "\n");
                    }
            }

            return pretty.toString().trim();
    }
Todtoday answered 12/2, 2014 at 18:16 Comment(5)
where is repeatString method..?Galvanize
private static String repeatString(int stack) { StringBuilder indent = new StringBuilder(); for (int i = 0; i < stack; i++){ indent.append(" "); } return indent.toString(); }Tibiotarsus
Yes [user1912935], what @Tibiotarsus wrote, should be simple enough :)Todtoday
Concatenation with a StringBuilder inside a loop: Bad practice.Tangelatangelo
@Tangelatangelo But it's super easy to split to new lines, this just illustrates a simple approach without any DOM trees.Todtoday
C
5

Just another solution which works for us

import java.io.StringWriter;
import org.dom4j.DocumentHelper;
import org.dom4j.io.OutputFormat;
import org.dom4j.io.XMLWriter;

**
 * Pretty Print XML String
 * 
 * @param inputXmlString
 * @return
 */
public static String prettyPrintXml(String xml) {

    final StringWriter sw;

    try {
        final OutputFormat format = OutputFormat.createPrettyPrint();
        final org.dom4j.Document document = DocumentHelper.parseText(xml);
        sw = new StringWriter();
        final XMLWriter writer = new XMLWriter(sw, format);
        writer.write(document);
    }
    catch (Exception e) {
        throw new RuntimeException("Error pretty printing xml:\n" + xml, e);
    }
    return sw.toString();
}
Christcross answered 22/7, 2015 at 18:45 Comment(0)
Z
4

Using jdom2 : http://www.jdom.org/

import java.io.StringReader;
import org.jdom2.input.SAXBuilder;
import org.jdom2.output.Format;
import org.jdom2.output.XMLOutputter;

String prettyXml = new XMLOutputter(Format.getPrettyFormat()).
                         outputString(new SAXBuilder().build(new StringReader(uglyXml)));
Zero answered 11/5, 2015 at 15:0 Comment(1)
Amazing, it solved my problem. Thanks !Shores
C
3

As an alternative to the answers from max, codeskraps, David Easley and milosmns, have a look at my lightweight, high-performance pretty-printer library: xml-formatter

// construct lightweight, threadsafe, instance
PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().build();

StringBuilder buffer = new StringBuilder();
String xml = ..; // also works with char[] or Reader

if(prettyPrinter.process(xml, buffer)) {
     // valid XML, print buffer
} else {
     // invalid XML, print xml
}

Sometimes, like when running mocked SOAP services directly from file, it is good to have a pretty-printer which also handles already pretty-printed XML:

PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().ignoreWhitespace().build();

As some have commented, pretty-printing is just a way of presenting XML in a more human-readable form - whitespace strictly does not belong in your XML data.

The library is intended for pretty-printing for logging purposes, and also includes functions for filtering (subtree removal / anonymization) and pretty-printing of XML in CDATA and Text nodes.

Cony answered 28/12, 2014 at 22:25 Comment(0)
T
2

I had the same problem and I'm having great success with JTidy (http://jtidy.sourceforge.net/index.html)

Example:

Tidy t = new Tidy();
t.setIndentContent(true);
Document d = t.parseDOM(
    new ByteArrayInputStream("HTML goes here", null);

OutputStream out = new ByteArrayOutputStream();
t.pprint(d, out);
String html = out.toString();
Tadeas answered 11/6, 2010 at 20:30 Comment(0)
I
2

I have found that in Java 1.6.0_32 the normal method to pretty print an XML string (using a Transformer with a null or identity xslt) does not behave as I would like if tags are merely separated by whitespace, as opposed to having no separating text. I tried using <xsl:strip-space elements="*"/> in my template to no avail. The simplest solution I found was to strip the space the way I wanted using a SAXSource and XML filter. Since my solution was for logging I also extended this to work with incomplete XML fragments. Note the normal method seems to work fine if you use a DOMSource but I did not want to use this because of the incompleteness and memory overhead.

public static class WhitespaceIgnoreFilter extends XMLFilterImpl
{

    @Override
    public void ignorableWhitespace(char[] arg0,
                                    int arg1,
                                    int arg2) throws SAXException
    {
        //Ignore it then...
    }

    @Override
    public void characters( char[] ch,
                            int start,
                            int length) throws SAXException
    {
        if (!new String(ch, start, length).trim().equals("")) 
               super.characters(ch, start, length); 
    }
}

public static String prettyXML(String logMsg, boolean allowBadlyFormedFragments) throws SAXException, IOException, TransformerException
    {
        TransformerFactory transFactory = TransformerFactory.newInstance();
        transFactory.setAttribute("indent-number", new Integer(2));
        Transformer transformer = transFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
        StringWriter out = new StringWriter();
        XMLReader masterParser = SAXHelper.getSAXParser(true);
        XMLFilter parser = new WhitespaceIgnoreFilter();
        parser.setParent(masterParser);

        if(allowBadlyFormedFragments)
        {
            transformer.setErrorListener(new ErrorListener()
            {
                @Override
                public void warning(TransformerException exception) throws TransformerException
                {
                }

                @Override
                public void fatalError(TransformerException exception) throws TransformerException
                {
                }

                @Override
                public void error(TransformerException exception) throws TransformerException
                {
                }
            });
        }

        try
        {
            transformer.transform(new SAXSource(parser, new InputSource(new StringReader(logMsg))), new StreamResult(out));
        }
        catch (TransformerException e)
        {
            if(e.getCause() != null && e.getCause() instanceof SAXParseException)
            {
                if(!allowBadlyFormedFragments || !"XML document structures must start and end within the same entity.".equals(e.getCause().getMessage()))
                {
                    throw e;
                }
            }
            else
            {
                throw e;
            }
        }
        out.flush();
        return out.toString();
    }
Iceni answered 5/7, 2012 at 19:36 Comment(0)
B
2

I always use the below function:

public static String prettyPrintXml(String xmlStringToBeFormatted) {
    String formattedXmlString = null;
    try {
        DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
        documentBuilderFactory.setValidating(true);
        DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
        InputSource inputSource = new InputSource(new StringReader(xmlStringToBeFormatted));
        Document document = documentBuilder.parse(inputSource);

        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");

        StreamResult streamResult = new StreamResult(new StringWriter());
        DOMSource dOMSource = new DOMSource(document);
        transformer.transform(dOMSource, streamResult);
        formattedXmlString = streamResult.getWriter().toString().trim();
    } catch (Exception ex) {
        StringWriter sw = new StringWriter();
        ex.printStackTrace(new PrintWriter(sw));
        System.err.println(sw.toString());
    }
    return formattedXmlString;
}
Barogram answered 6/5, 2019 at 13:27 Comment(3)
This is importent: transformer.setOutputProperty("{xml.apache.org/xslt}indent-amount", "2"); If its missing all elements are left alignedChangteh
@HankLapidez That's true.Barogram
Building a DOM here is unnecessary and inefficient.Asta
A
1

There is a very nice command line XML utility called xmlstarlet(http://xmlstar.sourceforge.net/) that can do a lot of things which a lot of people use.

You could execute this program programmatically using Runtime.exec and then read in the formatted output file. It has more options and better error reporting than a few lines of Java code can provide.

download xmlstarlet : http://sourceforge.net/project/showfiles.php?group_id=66612&package_id=64589

Amboceptor answered 26/9, 2008 at 14:38 Comment(0)
R
1

For those searching for a quick and dirty solution - which doesn't need the XML to be 100% valid. e.g. in case of REST / SOAP logging (you never know what the others send ;-))

I found and advanced a code snipped I found online which I think is still missing here as a valid possible approach:

public static String prettyPrintXMLAsString(String xmlString) {
    /* Remove new lines */
    final String LINE_BREAK = "\n";
    xmlString = xmlString.replaceAll(LINE_BREAK, "");
    StringBuffer prettyPrintXml = new StringBuffer();
    /* Group the xml tags */
    Pattern pattern = Pattern.compile("(<[^/][^>]+>)?([^<]*)(</[^>]+>)?(<[^/][^>]+/>)?");
    Matcher matcher = pattern.matcher(xmlString);
    int tabCount = 0;
    while (matcher.find()) {
        String str1 = (null == matcher.group(1) || "null".equals(matcher.group())) ? "" : matcher.group(1);
        String str2 = (null == matcher.group(2) || "null".equals(matcher.group())) ? "" : matcher.group(2);
        String str3 = (null == matcher.group(3) || "null".equals(matcher.group())) ? "" : matcher.group(3);
        String str4 = (null == matcher.group(4) || "null".equals(matcher.group())) ? "" : matcher.group(4);

        if (matcher.group() != null && !matcher.group().trim().equals("")) {
            printTabs(tabCount, prettyPrintXml);
            if (!str1.equals("") && str3.equals("")) {
                ++tabCount;
            }
            if (str1.equals("") && !str3.equals("")) {
                --tabCount;
                prettyPrintXml.deleteCharAt(prettyPrintXml.length() - 1);
            }

            prettyPrintXml.append(str1);
            prettyPrintXml.append(str2);
            prettyPrintXml.append(str3);
            if (!str4.equals("")) {
                prettyPrintXml.append(LINE_BREAK);
                printTabs(tabCount, prettyPrintXml);
                prettyPrintXml.append(str4);
            }
            prettyPrintXml.append(LINE_BREAK);
        }
    }
    return prettyPrintXml.toString();
}

private static void printTabs(int count, StringBuffer stringBuffer) {
    for (int i = 0; i < count; i++) {
        stringBuffer.append("\t");
    }
}

public static void main(String[] args) {
    String x = new String(
            "<soap:Envelope xmlns:soap=\"http://schemas.xmlsoap.org/soap/envelope/\"><soap:Body><soap:Fault><faultcode>soap:Client</faultcode><faultstring>INVALID_MESSAGE</faultstring><detail><ns3:XcbSoapFault xmlns=\"\" xmlns:ns3=\"http://www.someapp.eu/xcb/types/xcb/v1\"><CauseCode>20007</CauseCode><CauseText>INVALID_MESSAGE</CauseText><DebugInfo>Problems creating SAAJ object model</DebugInfo></ns3:XcbSoapFault></detail></soap:Fault></soap:Body></soap:Envelope>");
    System.out.println(prettyPrintXMLAsString(x));
}

here is the output:

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <soap:Fault>
        <faultcode>soap:Client</faultcode>
        <faultstring>INVALID_MESSAGE</faultstring>
        <detail>
            <ns3:XcbSoapFault xmlns="" xmlns:ns3="http://www.someapp.eu/xcb/types/xcb/v1">
                <CauseCode>20007</CauseCode>
                <CauseText>INVALID_MESSAGE</CauseText>
                <DebugInfo>Problems creating SAAJ object model</DebugInfo>
            </ns3:XcbSoapFault>
        </detail>
    </soap:Fault>
  </soap:Body>
</soap:Envelope>
Rand answered 7/10, 2013 at 23:32 Comment(1)
"Quick and dirty" describes it well.Asta
D
1

The solutions I have found here for Java 1.6+ do not reformat the code if it is already formatted. The one that worked for me (and re-formatted already formatted code) was the following.

import org.apache.xml.security.c14n.CanonicalizationException;
import org.apache.xml.security.c14n.Canonicalizer;
import org.apache.xml.security.c14n.InvalidCanonicalizerException;
import org.w3c.dom.Element;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.TransformerException;
import java.io.IOException;
import java.io.StringReader;

public class XmlUtils {
    public static String toCanonicalXml(String xml) throws InvalidCanonicalizerException, ParserConfigurationException, SAXException, CanonicalizationException, IOException {
        Canonicalizer canon = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS);
        byte canonXmlBytes[] = canon.canonicalize(xml.getBytes());
        return new String(canonXmlBytes);
    }

    public static String prettyFormat(String input) throws TransformerException, ParserConfigurationException, IOException, SAXException, InstantiationException, IllegalAccessException, ClassNotFoundException {
        InputSource src = new InputSource(new StringReader(input));
        Element document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src).getDocumentElement();
        Boolean keepDeclaration = input.startsWith("<?xml");
        DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
        DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
        LSSerializer writer = impl.createLSSerializer();
        writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE);
        writer.getDomConfig().setParameter("xml-declaration", keepDeclaration);
        return writer.writeToString(document);
    }
}

It is a good tool to use in your unit tests for full-string xml comparison.

private void assertXMLEqual(String expected, String actual) throws ParserConfigurationException, IOException, SAXException, CanonicalizationException, InvalidCanonicalizerException, TransformerException, IllegalAccessException, ClassNotFoundException, InstantiationException {
    String canonicalExpected = prettyFormat(toCanonicalXml(expected));
    String canonicalActual = prettyFormat(toCanonicalXml(actual));
    assertEquals(canonicalExpected, canonicalActual);
}
Doorpost answered 14/10, 2014 at 12:47 Comment(0)
C
1

I saw one answer using Scala, so here is another one in Groovy, just in case someone finds it interesting. The default indentation is 2 steps, XmlNodePrinter constructor can be passed another value as well.

def xml = "<tag><nested>hello</nested></tag>"
def stringWriter = new StringWriter()
def node = new XmlParser().parseText(xml);
new XmlNodePrinter(new PrintWriter(stringWriter)).print(node)
println stringWriter.toString()

Usage from Java if groovy jar is in classpath

  String xml = "<tag><nested>hello</nested></tag>";
  StringWriter stringWriter = new StringWriter();
  Node node = new XmlParser().parseText(xml);
  new XmlNodePrinter(new PrintWriter(stringWriter)).print(node);
  System.out.println(stringWriter.toString());
Comintern answered 1/5, 2015 at 20:42 Comment(0)
M
1

Underscore-java has static method U.formatXml(string). Live example

import com.github.underscore.U;

public class MyClass {
    public static void main(String args[]) {
        String xml = "<tag><nested>hello</nested></tag>";

        System.out.println(U.formatXml("<?xml version=\"1.0\" encoding=\"UTF-8\"?><root>" + xml + "</root>"));
    }
}

Output:

<?xml version="1.0" encoding="UTF-8"?>
<root>
   <tag>
      <nested>hello</nested>
   </tag>
</root>
Marlee answered 14/2, 2019 at 4:15 Comment(1)
This is awesome!Leandraleandre
D
0

In case you do not need indentation that much but a few line breaks, it could be sufficient to simply regex...

String leastPrettifiedXml = uglyXml.replaceAll("><", ">\n<");

The code is nice, not the result because of missing indentation.


(For solutions with indentation, see other answers.)

Denbrook answered 20/8, 2015 at 13:20 Comment(3)
Hmmmm... Just thinking loud, who would be needing such solution? Only area I can see is data we get from some web services and just to test that data and its validity, developer or tester might need such easy ones. Otherwise not a good option....Hendricks
@SudhakarChavali i'm a developer. i might need that for dirty println() and log.debug() hacks; i.e. some times i can only use log files from within a restricted server environment (with web admin interface instead of shell access) instead of sane step-by-step-debugging the program.Denbrook
Never attempt to parse XML by hand in this way. Your code will destroy the XML if, for example, it contains a "<" within a comment.Asta
T
-1

Try this:

 try
                    {
                        TransformerFactory transFactory = TransformerFactory.newInstance();
                        Transformer transformer = null;
                        transformer = transFactory.newTransformer();
                        StringWriter buffer = new StringWriter();
                        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
                        transformer.transform(new DOMSource(element),
                                  new StreamResult(buffer)); 
                        String str = buffer.toString();
                        System.out.println("XML INSIDE IS #########################################"+str);
                        return element;
                    }
                    catch (TransformerConfigurationException e)
                    {
                        e.printStackTrace();
                    }
                    catch (TransformerException e)
                    {
                        e.printStackTrace();
                    }
Thera answered 14/5, 2016 at 0:16 Comment(2)
Can’t see the difference with some of the already posted answers.Him
Building a DOM here is unnecessary and inefficient. Use a StreamSource.Asta
S
-1

I should have looked for this page first before coming up with my own solution! Anyway, mine uses Java recursion to parse the xml page. This code is totally self-contained and does not rely on third party libraries. Also .. it uses recursion!

// you call this method passing in the xml text
public static void prettyPrint(String text){
    prettyPrint(text, 0);
}

// "index" corresponds to the number of levels of nesting and/or the number of tabs to print before printing the tag
public static void prettyPrint(String xmlText, int index){
    boolean foundTagStart = false;
    StringBuilder tagChars = new StringBuilder();
    String startTag = "";
    String endTag = "";
    String[] chars = xmlText.split("");
    // find the next start tag
    for(String ch : chars){
        if(ch.equalsIgnoreCase("<")){
            tagChars.append(ch);
            foundTagStart = true;
        } else if(ch.equalsIgnoreCase(">") && foundTagStart){
            startTag = tagChars.append(ch).toString();
            String tempTag = startTag;
            endTag = (tempTag.contains("\"") ? (tempTag.split(" ")[0] + ">") : tempTag).replace("<", "</"); // <startTag attr1=1 attr2=2> => </startTag>
            break;
        } else if(foundTagStart){
            tagChars.append(ch);
        }
    }
    // once start and end tag are calculated, print start tag, then content, then end tag
    if(foundTagStart){
        int startIndex = xmlText.indexOf(startTag);
        int endIndex = xmlText.indexOf(endTag);
        // handle if matching tags NOT found
        if((startIndex < 0) || (endIndex < 0)){
            if(startIndex < 0) {
                // no start tag found
                return;
            } else {
                // start tag found, no end tag found (handles single tags aka "<mytag/>" or "<?xml ...>")
                printTabs(index);
                System.out.println(startTag);
                // move on to the next tag
                // NOTE: "index" (not index+1) because next tag is on same level as this one
                prettyPrint(xmlText.substring(startIndex+startTag.length(), xmlText.length()), index);
                return;
            }
        // handle when matching tags found
        } else {
            String content = xmlText.substring(startIndex+startTag.length(), endIndex);
            boolean isTagContainsTags = content.contains("<"); // content contains tags
            printTabs(index);
            if(isTagContainsTags){ // ie: <tag1><tag2>stuff</tag2></tag1>
                System.out.println(startTag);
                prettyPrint(content, index+1); // "index+1" because "content" is nested
                printTabs(index);
            } else {
                System.out.print(startTag); // ie: <tag1>stuff</tag1> or <tag1></tag1>
                System.out.print(content);
            }
            System.out.println(endTag);
            int nextIndex = endIndex + endTag.length();
            if(xmlText.length() > nextIndex){ // if there are more tags on this level, continue
                prettyPrint(xmlText.substring(nextIndex, xmlText.length()), index);
            }
        }
    } else {
        System.out.print(xmlText);
    }
}

private static void printTabs(int counter){
    while(counter-- > 0){ 
        System.out.print("\t");
    }
}
Seddon answered 4/11, 2017 at 4:18 Comment(2)
Underscore-java, U.formatXml(xml) also doesn't rely on third party libraries.Marlee
Never attempt to parse XML by hand like this. Your code will destroy the XML if, for example, it contains a "<" in a comment or CDATA section.Asta
R
-1

I was trying to achieve something similar, but without any external dependency. The application was already using DOM to format just for logging the XMLs!

Here is my sample snippet

public void formatXML(final String unformattedXML) {
    final int length = unformattedXML.length();
    final int indentSpace = 3;
    final StringBuilder newString = new StringBuilder(length + length / 10);
    final char space = ' ';
    int i = 0;
    int indentCount = 0;
    char currentChar = unformattedXML.charAt(i++);
    char previousChar = currentChar;
    boolean nodeStarted = true;
    newString.append(currentChar);
    for (; i < length - 1;) {
        currentChar = unformattedXML.charAt(i++);
        if(((int) currentChar < 33) && !nodeStarted) {
            continue;
        }
        switch (currentChar) {
        case '<':
            if ('>' == previousChar && '/' != unformattedXML.charAt(i - 1) && '/' != unformattedXML.charAt(i) && '!' != unformattedXML.charAt(i)) {
                indentCount++;
            }
            newString.append(System.lineSeparator());
            for (int j = indentCount * indentSpace; j > 0; j--) {
                newString.append(space);
            }
            newString.append(currentChar);
            nodeStarted = true;
            break;
        case '>':
            newString.append(currentChar);
            nodeStarted = false;
            break;
        case '/':
            if ('<' == previousChar || '>' == unformattedXML.charAt(i)) {
                indentCount--;
            }
            newString.append(currentChar);
            break;
        default:
            newString.append(currentChar);
        }
        previousChar = currentChar;
    }
    newString.append(unformattedXML.charAt(length - 1));
    System.out.println(newString.toString());
}
Rinaldo answered 10/6, 2019 at 18:16 Comment(3)
It deletes spaces in text. Example: <text>\n some example lol\n<text> after transform:<text>someexamplelol<test>Hardej
yeah, and it has other defects as well like handling of comments, DTD, if any, etc. However, correction upon this I was able to get an acceptable (apart from complex elements like <text>some complex <b>text again</b> and then nothing</text>) logic working. Don't have the code handy now, will find some leisure time to write againRinaldo
Please indicate how this solution improves upon already existing answers, otherwise, it just adds noise.Him

© 2022 - 2024 — McMap. All rights reserved.