How to pretty print XML from the command line?
Asked Answered
W

13

685

Related: How can I pretty-print JSON in (unix) shell script?

Is there a (unix) shell script to format XML in human-readable form?

Basically, I want it to transform the following:

<root><foo a="b">lorem</foo><bar value="ipsum" /></root>

... into something like this:

<root>
    <foo a="b">lorem</foo>
    <bar value="ipsum" />
</root>
Wyckoff answered 18/4, 2013 at 18:50 Comment(2)
To have xmllint available on Debian systems, you need to install the package libxml2-utils (libxml2 does not provide this tool, at least not on Debian 5.0 "Lenny" and 6.0 "Squeeze").Heyer
web browsers (e.g. firefox / chrome) tend to do a good job of pretty-printing XML documents these days. (posting as a comment because this isn't a CLI, but a very convenient alternative)Lough
P
1167

xmllint from libxml2

xmllint --format file.xml

(On Debian based distributions install the libxml2-utils package)

xml_pp from the XML::Twig module

xml_pp < file.xml

(On Debian based distributions install the xml-twig-tools package)

XMLStarlet

xmlstarlet format --indent-tab file.xml

Tidy

tidy -xml -i -q file.xml

Python's xml.dom.minidom

echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
  python -c 'import sys, xml.dom.minidom; print(xml.dom.minidom.parseString(sys.stdin.read()).toprettyxml())'

saxon-lint (my own project)

saxon-lint --indent --xpath '/' file.xml

saxon-HE

echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' |
  java -cp /usr/share/java/saxon/saxon9he.jar net.sf.saxon.Query \
       -s:- -qs:/ '!indent=yes'

xidel

xidel --output-node-format=xml --output-node-indent -se . -s file.xml

(Credit to Reino)

Output for all commands:

<root>
  <foo a="b">lorem</foo>
  <bar value="ipsum"/>
</root>
Plutus answered 18/4, 2013 at 18:51 Comment(15)
Good, quick answer. The first option seems like it'll be more ubiquitous on modern *nix installs. A minor point; but can it be called without working through an intermediate file? I.e., echo '<xml .. />' | xmllint --some-read-from-stdn-option?Wyckoff
The package is libxml2-utils in my beautiful ubuntu.Microbalance
Do you know how to wrap long lines?Enchain
for the xml_pp option: you can install from the repositories xml-twig-tools (ubuntu)Pestle
Note that the "cat data.xml | xmllint --format - | tee data.xml" does not work. On my system it sometimes worked for small files, but always truncated huge files. If you really want to do anything in place read backreference.org/2011/01/29/in-place-editing-of-filesNielson
I did this to use xmllint: find . -name "*.xml" -exec xmllint --format --output {} {} \; You have to can specify a file to it put the result of the pretty print. In my case, I just wanted to update the file, so the ouput file was the sameone as the input.Enjoyable
To solve UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 805: ordinal not in range(128) in python version you want to define PYTHONIOENCODING="UTF-8": cat some.xml | PYTHONIOENCODING="UTF-8" python -c 'import sys;import xml.dom.minidom;s=sys.stdin.read();print xml.dom.minidom.parseString(s).toprettyxml()' > pretty.xmlKiri
xmllint seems to silently refuse to do any formatting if lines are too long.Frown
It seems tidy is only option to put attributes on separate lines.Magnificat
#9943094 shows another fix for the unicode error in python when piping to a file.Simulator
Note that tidy can also format xml with no root element. This is useful to format through a pipe, xml sections (e.g. extracted from logs). echo '<x></x><y></y>' | tidy -xml -iqPendergast
Does any of these options work with tail? I'm able to tail a streaming log from a remote application which has XML fragments in it and would like the XML fragments to get formatted as the output stream rather than the formatting program hanging until tail is killed. I tried tidy and it only spits out format warnings as they are encountered.Woven
didn't find any coloring options? any hints? for now I use vim to get coloring, but then I have to create a newly formatted xml to have good readability againYonyona
Which of these options are portable enough to run on Windows?Teakwood
Modern distros require you to write python3 instead of python because Python 3 is not a backwards compatible language and Python 2 has already been EOL'd.Rainproof
C
190

xmllint --format yourxmlfile.xml

xmllint is a command line XML tool and is included in libxml2 (http://xmlsoft.org/).

================================================

Note: If you don't have libxml2 installed you can install it by doing the following:

CentOS

cd /tmp
wget ftp://xmlsoft.org/libxml2/libxml2-2.8.0.tar.gz
tar xzf libxml2-2.8.0.tar.gz
cd libxml2-2.8.0/
./configure
make
sudo make install
cd

Ubuntu

sudo apt-get install libxml2-utils

Cygwin

apt-cyg install libxml2

MacOS

To install this on MacOS with Homebrew just do: brew install libxml2

Git

Also available on Git if you want the code: git clone git://git.gnome.org/libxml2

Curia answered 15/11, 2013 at 15:34 Comment(5)
sputnick's answer contains this information, but crmpicco's answer is the most useful answer here to the general question about how to pretty print XML.Dzoba
we can write out that formatted xml output to some other xml file and use that.. eg xmllint --format yourxmlfile.xml >> new-file.xmlWesle
On Ubuntu 16.04 you can use the following: sudo apt-get install libxml2-utilsSeparates
This works on Windows too; git for Windows download even installs a recent version of xmllint. Example: "C:\Program Files\Git\usr\bin\xmllint.exe" --format [email protected] > [email protected]Bolton
From MacOS with libxml2 installed via brew. To unminify an xml and save it to a new file for me it worked this command xmllint --format in.xml > out.xmlMultipartite
O
46

You can also use tidy, which may need to be installed first (e.g. on Ubuntu: sudo apt-get install tidy).

For this, you would issue something like following:

tidy -xml -i your-file.xml > output.xml

Note: has many additional readability flags, but word-wrap behavior is a bit annoying to untangle (http://tidy.sourceforge.net/docs/quickref.html).

Olympiaolympiad answered 12/10, 2014 at 16:29 Comment(4)
Helpful, because I couldn't get xmllint to add linebreaks to a single line xml file. Thanks!Agriculturist
tidy works well for me too. Unlike hxnormalize, this done actually closes the <body> tag.Lafreniere
BTW, here are some options that I have found useful: tidy --indent yes --indent-spaces 4 --indent-attributes yes --wrap-attributes yes --input-xml yes --output-xml yes < InFile.xml > OutFile.xml.Flavoprotein
Great tip @VictorYarema. I combined it with pygmentize and added it to my .bashrc: alias prettyxml='tidy --indent yes --indent-spaces 4 --indent-attributes yes --wrap-attributes yes --input-xml yes --output-xml yes | pygmentize -l xml' and then can curl url | prettyxmlChaise
R
21

Without installing anything on macOS / most Unix.

Use tidy

cat filename.xml | tidy -xml -iq

Redirecting viewing a file with cat to tidy specifying the file type of xml and to indent while quiet output will suppress error output. JSON also works with -json.

Remuneration answered 9/5, 2019 at 20:12 Comment(1)
You don't need the cat step: tidy -xml -iq filename.xml. Also, you can even do tidy -xml -iq filename.xml using the -m option to modify the original file...Verdi
B
15

You didn't mention a file, so I assume you want to provide the XML string as standard input on the command line. In that case, do the following:

$ echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' | xmllint --format -
Blanchard answered 18/4, 2013 at 19:9 Comment(0)
G
14

xmllint support formatting in-place:

for f in *.xml; do xmllint -o $f --format $f; done

As Daniel Veillard has written:

I think xmllint -o tst.xml --format tst.xml should be safe as the parser will fully load the input into a tree before opening the output to serialize it.

Indent level is controlled by XMLLINT_INDENT environment variable which is by default 2 spaces. Example how to change indent to 4 spaces:

XMLLINT_INDENT='    '  xmllint -o out.xml --format in.xml

You may have lack with --recover option when you XML documents are broken. Or try weak HTML parser with strict XML output:

xmllint --html --xmlout <in.xml >out.xml

--nsclean, --nonet, --nocdata, --noblanks etc may be useful. Read man page.

apt-get install libxml2-utils
dnf install libxml2
apt-cyg install libxml2
brew install libxml2
Gingili answered 28/5, 2018 at 20:18 Comment(0)
L
4

This simple(st) solution doesn't provide indentation, but it is nevertheless much easier on the human eye. Also it allows the xml to be handled more easily by simple tools like grep, head, awk, etc.

Use sed to replace '<' with itself preceeded with a newline.

And as mentioned by Gilles, it's probably not a good idea to use this in production.

# check you are getting more than one line out
sed 's/</\n</g' sample.xml | wc -l

# check the output looks generally ok
sed 's/</\n</g' sample.xml | head

# capture the pretty xml in a different file
sed 's/</\n</g' sample.xml > prettySample.xml
Laevogyrate answered 10/12, 2020 at 12:39 Comment(2)
Thanks for this reply that uses nothing needed to download.Musgrave
sed is not a xml parserPhotometer
L
3

This took me forever to find something that works on my mac. Here's what worked for me:

brew install xmlformat
cat unformatted.html | xmlformat
Lafreniere answered 16/3, 2020 at 6:31 Comment(0)
H
1

With :

$ xidel -s input.xml -e . --output-node-format=xml --output-node-indent
$ xidel -s input.xml -e 'serialize(.,{"indent":true()})'

$ echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' | \
  xidel -se . --output-node-format=xml --output-node-indent
$ echo '<root><foo a="b">lorem</foo><bar value="ipsum" /></root>' | \
  xidel -se 'serialize(.,{"indent":true()})'
Hanni answered 28/11, 2020 at 16:11 Comment(3)
The first solution appears to be out of date as neither option is in xidel --help and although the second solution throws no error (the echoed solution needs - after xidel to receive standard input) this also does not indent the xml.Upstretched
@Upstretched Please use an up-to-date binary.Hanni
This was based on the last official release Xidel 0.9.8.Upstretched
W
1

yq can be used to pretty print XML. It has an option to define the indent.

yq --input-format xml --output-format xml --indent 2
Wilkie answered 10/11, 2022 at 16:26 Comment(1)
there is also yq -P but I tried it and looks like not really working. Just yq --input-format xml --output-format xml produced a well formatted XMLPanocha
U
0

Edit:

Disclaimer: you should usually prefer installing a mature tool like xmllint to do a job like this. XML/HTML can be a horribly mutilated mess. However, there are valid situations where using existing tooling is preferable over manually installing new ones, and where it is also a safe bet the XML's source is valid (enough). I've written this script for one of those cases, but they are rare, so precede with caution.


I'd like to add a pure Bash solution, as it is not 'that' difficult to just do it by hand, and sometimes you won't want to install an extra tool to do the job.

#!/bin/bash

declare -i currentIndent=0
declare -i nextIncrement=0
while read -r line ; do
  currentIndent+=$nextIncrement
  nextIncrement=0
  if [[ "$line" == "</"* ]]; then # line contains a closer, just decrease the indent
    currentIndent+=-1
  else
    dirtyStartTag="${line%%>*}"
    dirtyTagName="${dirtyStartTag%% *}"
    tagName="${dirtyTagName//</}"
    # increase indent unless line contains closing tag or closes itself
    if [[ ! "$line" =~ "</$tagName>" && ! "$line" == *"/>"  ]]; then
      nextIncrement+=1
    fi
  fi

  # print with indent
  printf "%*s%s" $(( $currentIndent * 2 )) # print spaces for the indent count
  echo $line
done <<< "$(cat - | sed 's/></>\n</g')" # separate >< with a newline

Paste it in a script file, and pipe in the xml. This assumes the xml is all on one line, and there are no extra spaces anywhere. One could easily add some extra \s* to the regexes to fix that.

Usable answered 6/5, 2020 at 17:10 Comment(5)
Hope to never see this somewhere as a sysadmin -_-Photometer
@GillesQuenot What do you mean? Is there a security risk I'm not seeing?Usable
Because parsing XML/HTML with anything else than a real parser is (or will be soon) plain buggy. If it's a small personal script on a personal computer, up to you, but for production, no way. It will break !Photometer
I agree XML/HTML can be horribly mutilated, but it does depend on the source. I wrote this for some XML we generate ourselves, so it is a pretty safe bet there.Usable
Until an intern change the way XML is made :)Photometer
D
0

I would:

nicholas@mordor:~/flwor$ 
nicholas@mordor:~/flwor$ cat ugly.xml 


<root><foo a="b">lorem</foo><bar value="ipsum" /></root>

nicholas@mordor:~/flwor$ 
nicholas@mordor:~/flwor$ basex
BaseX 9.0.1 [Standalone]
Try 'help' to get more information.
> 
> create database pretty
Database 'pretty' created in 231.32 ms.
> 
> open pretty
Database 'pretty' was opened in 0.05 ms.
> 
> set parser xml
PARSER: xml
> 
> add ugly.xml
Resource(s) added in 161.88 ms.
> 
> xquery .
<root>
  <foo a="b">lorem</foo>
  <bar value="ipsum"/>
</root>
Query executed in 179.04 ms.
> 
> exit
Have fun.
nicholas@mordor:~/flwor$ 

if only because then it's "in" a database, and not "just" a file. Easier to work with, to my mind.

Subscribing to the belief that others have worked this problem out already. If you prefer, no doubt eXist might even be "better" at formatting xml, or as good.

You can always query the data various different ways, of course. I kept it as simple as possible. You can just use a GUI, too, but you specified console.

Ductile answered 22/11, 2020 at 11:16 Comment(0)
T
0

You can try my cli tool xmq (https://libxmq.org) to pretty print and syntax highlight XML and HTML. Note that it renders the XML/HTML/JSON in the XMQ format which is easier to read and edit. There is however a 1-1 mapping between XMQ and XML. The tool is very useful for analyzing large xml and html files.

The xmq tool also includes a pager for the terminal: xmq file.xml page

The tool can also render into a temporary html file which is automatically viewed in your default browser: xmq file.xml browse

It picks the color scheme from your terminals background color (light or dark), but you can override it:

XMQ_BG=dark xmq file.xml browse

XMQ_BG=light xmq file.xml browse

It works in a pipeline as well: curl https://slashdot.org | xmq delete //script delete //style page

Apart from deleting xpath matched nodes, there are other commands to convert to and from xmq/xml/html/json and apply transformations to the content.

Tab answered 17/1 at 15:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.