Is there a stylesheet or Windows commandline tool for controllable XML formatting, specifically putting attributes one-per-line?
Asked Answered
W

7

15

I am searching for an XSLT or command-line tool (or C# code that can be made into a command-line tool, etc) for Windows that will do XML pretty-printing. Specifically, I want one that has the ability to put attributes one-to-a-line, something like:

<Node>
   <ChildNode 
      value1='5'
      value2='6'
      value3='happy' />
</Node>

It doesn't have to be EXACTLY like that, but I want to use it for an XML file that has nodes with dozens of attributes and spreading them across multiple lines makes them easier to read, edit, and text-diff.

NOTE: I think my preferred solution is an XSLT sheet I can pass through a C# method, though a Windows command-line tool is good too.

Weinreb answered 2/4, 2010 at 18:5 Comment(3)
I updated my answer and posted an example.Hawkie
@stafford - I'd still take a look at Tidy (see my answer below). Its a handy command-line tool to have in your repertoire if you deal with XML, even if you don't end up using it for this particular problem.Catchup
@Bert F: Thanks, I will. I've found pretty-printing and canonicalizing tools to be very useful to keep in the toolbox in other domains.Weinreb
B
11

Here's a small C# sample, which can be used directly by your code, or built into an exe and called at the comand-line as "myexe from.xml to.xml":

    using System.Xml;

    static void Main(string[] args)
    {
        XmlWriterSettings settings = new XmlWriterSettings {
            NewLineHandling = NewLineHandling.Entitize,
            NewLineOnAttributes = true, Indent = true, IndentChars = "  ",
            NewLineChars = Environment.NewLine
        };

        using (XmlReader reader = XmlReader.Create(args[0]))
        using (XmlWriter writer = XmlWriter.Create(args[1], settings)) {
            writer.WriteNode(reader, false);
            writer.Close();
        }
    }

Sample input:

<Node><ChildNode value1='5' value2='6' value3='happy' /></Node>

Sample output (note you can remove the <?xml ... with settings.OmitXmlDeclaration):

<?xml version="1.0" encoding="utf-8"?>
<Node>
  <ChildNode
    value1="5"
    value2="6"
    value3="happy" />
</Node>

Note that if you want a string rather than write to a file, just swap with StringBuilder:

StringBuilder sb = new StringBuilder();
using (XmlReader reader = XmlReader.Create(new StringReader(oldXml)))
using (XmlWriter writer = XmlWriter.Create(sb, settings)) {
    writer.WriteNode(reader, false);
    writer.Close();
}
string newXml = sb.ToString();
Beatify answered 18/4, 2010 at 9:46 Comment(2)
This was what I needed, thanks. I didn't know the XmlWriterSettings existed, I was working with an XDocument which has a .Save and a very limited set of SaveOptions... once I knew C# had the XmlWriterSettings object and (obviously) NewLineOnAttributes, I was good to go.Weinreb
Here you are using two files one for source and one output file. Is it possible to do this in the same file? Change the source file with neat and tidy XML. Thanks in advance.Lamarckian
M
14

Here's a PowerShell script to do it. It takes the following input:

<?xml version="1.0" encoding="utf-8"?>
<Node>
    <ChildNode value1="5" value2="6" value3="happy" />
</Node>

...and produces this as output:

<?xml version="1.0" encoding="utf-8"?>
<Node>
  <ChildNode
    value1="5"
    value2="6"
    value3="happy" />
</Node>

Here you go:

param(
    [string] $inputFile = $(throw "Please enter an input file name"),
    [string] $outputFile = $(throw "Please supply an output file name")
)

$data = [xml](Get-Content $inputFile)

$xws = new-object System.Xml.XmlWriterSettings
$xws.Indent = $true
$xws.IndentChars = "  "
$xws.NewLineOnAttributes = $true

$data.Save([Xml.XmlWriter]::Create($outputFile, $xws))

Take that script, save it as C:\formatxml.ps1. Then, from a PowerShell prompt type the following:

C:\formatxml.ps1 C:\Path\To\UglyFile.xml C:\Path\To\NeatAndTidyFile.xml

This script is basically just using the .NET framework so you could very easily migrate this into a C# application.

NOTE: If you have not run scripts from PowerShell before, you will have to execute the following command at an elevated PowerShell prompt before you will be able to execute the script:

Set-ExecutionPolicy RemoteSigned

You only have to do this one time though.

I hope that's useful to you.

Mezzorilievo answered 18/4, 2010 at 15:34 Comment(1)
8 years later and I am trying this and discovered an issue. This script doesn't close the output file until the Powershell window is closed. The solution was to create the XML writer as a separate object, pass that object into the $data.Save(..) call and then call Dispose() on the XML writer object as the last thingSuctorial
B
11

Here's a small C# sample, which can be used directly by your code, or built into an exe and called at the comand-line as "myexe from.xml to.xml":

    using System.Xml;

    static void Main(string[] args)
    {
        XmlWriterSettings settings = new XmlWriterSettings {
            NewLineHandling = NewLineHandling.Entitize,
            NewLineOnAttributes = true, Indent = true, IndentChars = "  ",
            NewLineChars = Environment.NewLine
        };

        using (XmlReader reader = XmlReader.Create(args[0]))
        using (XmlWriter writer = XmlWriter.Create(args[1], settings)) {
            writer.WriteNode(reader, false);
            writer.Close();
        }
    }

Sample input:

<Node><ChildNode value1='5' value2='6' value3='happy' /></Node>

Sample output (note you can remove the <?xml ... with settings.OmitXmlDeclaration):

<?xml version="1.0" encoding="utf-8"?>
<Node>
  <ChildNode
    value1="5"
    value2="6"
    value3="happy" />
</Node>

Note that if you want a string rather than write to a file, just swap with StringBuilder:

StringBuilder sb = new StringBuilder();
using (XmlReader reader = XmlReader.Create(new StringReader(oldXml)))
using (XmlWriter writer = XmlWriter.Create(sb, settings)) {
    writer.WriteNode(reader, false);
    writer.Close();
}
string newXml = sb.ToString();
Beatify answered 18/4, 2010 at 9:46 Comment(2)
This was what I needed, thanks. I didn't know the XmlWriterSettings existed, I was working with an XDocument which has a .Save and a very limited set of SaveOptions... once I knew C# had the XmlWriterSettings object and (obviously) NewLineOnAttributes, I was good to go.Weinreb
Here you are using two files one for source and one output file. Is it possible to do this in the same file? Change the source file with neat and tidy XML. Thanks in advance.Lamarckian
C
6

Try Tidy over on SourceForge. Although its often used on [X]HTML, I've used it successfully on XML before - just make sure you use the -xml option.

http://tidy.sourceforge.net/#docs

Tidy reads HTML, XHTML and XML files and writes cleaned up markup. ... For generic XML files, Tidy is limited to correcting basic well-formedness errors and pretty printing.

People have ported to several platforms and it available as an executable and callable library.

Tidy has a heap of options including:

http://api.html-tidy.org/tidy/quickref_5.0.0.html#indent

indent-attributes
Top Type: Boolean
Default: no Example: y/n, yes/no, t/f, true/false, 1/0
This option specifies if Tidy should begin each attribute on a new line.

One caveat:

Limited support for XML

XML processors compliant with W3C's XML 1.0 recommendation are very picky about which files they will accept. Tidy can help you to fix errors that cause your XML files to be rejected. Tidy doesn't yet recognize all XML features though, e.g. it doesn't understand CDATA sections or DTD subsets.

But I suspect unless your XML is really advanced, the tool should work fine.

Catchup answered 17/4, 2010 at 16:43 Comment(0)
A
2

There is a tool, that can split attributes to one per line: xmlpp. It's a perl script, so you'll have to install perl. Usage:

perl xmlpp.pl -t input.xml

You can also determine the ordering of attributes by creating a file called attributeOrdering.txt, and calling perl xmlpp.pl -s -t input.xml . For more options, use perl xmlpp.pl -h

I hope, it doesn't have too many bugs, but it has worked for me so far.

Andreaandreana answered 12/4, 2010 at 18:0 Comment(1)
@chris_l - thanks. my team is all ASP.NET C# developers so nobody will have perl installed. Getting them to install it is probably possible but not ideal. But I tried it out and it does do what it's supposed to do!Weinreb
K
0

XML Notepad 2007 can do so manually ... let me see if it can be scripted.

Nope ... it can launch it like so:

XmlNotepad.exe a.xml

The rest is just clicking the save button. Power Shell, other tools can automate that.

Kelley answered 2/4, 2010 at 18:15 Comment(1)
@Hamish Grubijan, that'd probably work but automating GUIs is awfully hacky -- there must be an easier way!Weinreb
S
0

Just use this xslt:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" encoding="ISO-8859-1"/>
  <xsl:param name="indent-increment" select="'   '"/>

  <xsl:template name="newline">
    <xsl:text disable-output-escaping="yes">
</xsl:text>
  </xsl:template>

  <xsl:template match="comment() | processing-instruction()">
    <xsl:param name="indent" select="''"/>
    <xsl:call-template name="newline"/>    
    <xsl:value-of select="$indent"/>
    <xsl:copy />
  </xsl:template>

  <xsl:template match="text()">
    <xsl:param name="indent" select="''"/>
    <xsl:call-template name="newline"/>    
    <xsl:value-of select="$indent"/>
    <xsl:value-of select="normalize-space(.)"/>
  </xsl:template>

  <xsl:template match="text()[normalize-space(.)='']"/>

  <xsl:template match="*">
    <xsl:param name="indent" select="''"/>
    <xsl:call-template name="newline"/>    
    <xsl:value-of select="$indent"/>
      <xsl:choose>
       <xsl:when test="count(child::*) > 0">
        <xsl:copy>
         <xsl:copy-of select="@*"/>
         <xsl:apply-templates select="*|text()">
           <xsl:with-param name="indent" select="concat ($indent, $indent-increment)"/>
         </xsl:apply-templates>
         <xsl:call-template name="newline"/>
         <xsl:value-of select="$indent"/>
        </xsl:copy>
       </xsl:when>       
       <xsl:otherwise>
        <xsl:copy-of select="."/>
       </xsl:otherwise>
     </xsl:choose>
  </xsl:template>    
</xsl:stylesheet>

Or, as another option, here is a perl script: http://software.decisionsoft.com/index.html

Salem answered 2/4, 2010 at 18:18 Comment(4)
That looks good. To complete the chore, is there a commandline xslt processor / can I use c# to process XML with an XSLT?Weinreb
Both, but saxon command-line xslt processor (saxon.sourceforge.net) should be enough :)Salem
I tried it and it doesn't seem to work. It indents each NODE, but doesn't newline/indent on each ATTRIBUTE. I replaced the indent-increment string with XYXY to make sure it was processing, and it was, but attributes are still in one long line.Weinreb
I'm going to open this up as a separate question, because I'm curious WHY this wasn't working for me.Weinreb
H
0

You can implement a simple SAX application that will copy everything as is and indent attributes how you like.

UPD:

SAX stands for Simple API for XML. It is a push model of XML parsing (a classical example of Builder design pattern). The API is present in most of the current development platforms (though native .Net class library lacks one, having XMLReader intead)

Here is a raw implementation in python, it is rather cryptic but you can realize the main idea.

from sys import stdout
from xml.sax import parse
from xml.sax.handler import ContentHandler
from xml.sax.saxutils import escape

class MyHandler(ContentHandler):

    def __init__(self, file_, encoding):
        self.level = 0
        self.elem_indent = '    '

        # should the next block make a line break
        self._allow_N = False
        # whether the opening tag was closed with > (to allow />)
        self._tag_open = False

        self._file = file_
        self._encoding = encoding

    def _write(self, string_):
        self._file.write(string_.encode(self._encoding))

    def startElement(self, name, attrs):
        if self._tag_open:
            self._write('>')
            self._tag_open = False

        if self._allow_N:
            self._write('\n')
            indent = self.elem_indent * self.level
        else:
            indent = ''
        self._write('%s<%s' % (indent, name))

        # attr indent equals to the element indent plus '  '
        attr_indent = self.elem_indent * self.level + '  '
        for name in attrs.getNames():
            # write indented attribute one per line
            self._write('\n%s%s="%s"' % (attr_indent, name, escape(attrs.getValue(name))))

        self._tag_open = True

        self.level += 1
        self._allow_N = True

    def endElement(self, name):
        self.level -= 1
        if self._tag_open:
            self._write(' />')
            self._tag_open = False
            return

        if self._allow_N:
            self._write('\n')
            indent = self.elem_indent * self.level
        else:
            indent = ''
        self._write('%s</%s>' % (indent, name))
        self._allow_N = True

    def characters(self, content):
        if self._tag_open:
            self._write('>')
            self._tag_open = False

        if content.strip():
            self._allow_N = False
            self._write(escape(content))
        else:
            self._allow_N = True


if __name__ == '__main__':
    parser = parse('test.xsl', MyHandler(stdout, stdout.encoding))
Hawkie answered 12/4, 2010 at 19:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.