How to format XML document in Linux
Asked Answered
A

4

17

I have following XML tags in a large number.

<SERVICE>
<NAME>
sh_SEET15002GetReKeyDetails
</NAME>
<ID>642</ID>
</SERVICE>

I want to get this formatted in the following manner. I have tried using xmllint but it is not working for me. Please provide help.

<SERVICE>
<NAME>sh_SEET15002GetReKeyDetails</NAME>
<ID>642</ID>
</SERVICE>
Aureomycin answered 22/1, 2014 at 6:39 Comment(1)
You want to format with or without programming ?Brasilein
U
38
xmllint -format -recover nonformatted.xml > formated.xml

For tab indentation:

export XMLLINT_INDENT=`echo -e '\t'`

For four space indentation:

export XMLLINT_INDENT=\ \ \ \ 
Urbannai answered 7/2, 2017 at 15:24 Comment(1)
In bash you don't need echo to get the TAB char. Just use $'\t'. In this case it is export XMLLINT_INDENT=$'\t'.Lacerta
B
9

Without programming you can use Eclipse XML Source Editor. Have a look at this answer

By the way have you tried xmllint -format -recover nonformatted.xml > formated.xml?

EDIT:

You can try this XMLStarlet Command Line XML Toolkit.

5. Formatting XML documents
====================================================

xml fo --help
XMLStarlet Toolkit: Format XML document
Usage: xml fo [<options>] <xml-file>
where <options> are
   -n or --noindent            - do not indent
   -t or --indent-tab          - indent output with tabulation
   -s or --indent-spaces <num> - indent output with <num> spaces
   -o or --omit-decl           - omit xml declaration <?xml version="1.0"?>
   -R or --recover             - try to recover what is parsable
   -D or --dropdtd             - remove the DOCTYPE of the input docs
   -C or --nocdata             - replace cdata section with text nodes
   -N or --nsclean             - remove redundant namespace declarations
   -e or --encode <encoding>   - output in the given encoding (utf-8, unicode...)
   -H or --html                - input is HTML
   -h or --help                - print help
Brasilein answered 22/1, 2014 at 7:20 Comment(3)
yes i have used the same command..but it is not working in this case. if possible there could be some shell script or xslt to do the needfulAureomycin
Check the EDIT and tell me that you are successful or not.Brasilein
The xmllint command did the trick for me. I think that line should be a standalone answer.Wendall
L
1

I do it from gedit. In gedit, you can add any script, in particular a Python script, as an External Tool. The script reads data from stdin and writes output to stdout, so it may be used as a stand-alone program. It layouts XML and sorts child nodes.

#!/usr/bin/env python
# encoding: utf-8

"""
This is a gedit plug-in to sort and layout XML.

In gedit, to add this tool, open: menu -- Tools -- Manage External Tools...
Create a new tool: click [+] under the list of tools, type in "Sort XML" as tool name,
paste the whole text from this file in the "Edit:" box, then 
configure the tool:
Input: Current selection
Output: Replace current selection

In gedit, to run this tool,
FIRST SELECT THE XML,
then open: menu -- Tools -- External Tools > -- Sort XML

"""


from lxml import etree
import sys
import io

def headerFirst(node):
    """Return the sorting key prefix, so that 'header' will go before any other node
    """
    nodetag=('%s' % node.tag).lower()
    if nodetag.endswith('}header') or nodetag == 'header':
        return '0'
    else:
        return '1'

def get_node_key(node, attr=None):
    """Return the sorting key of an xml node
    using tag and attributes
    """
    if attr is None:
        return '%s' % node.tag + ':'.join([node.get(attr)
                                        for attr in sorted(node.attrib)])
    if attr in node.attrib:
        return '%s:%s' % (node.tag, node.get(attr))
    return '%s' % node.tag


def sort_children(node, attr=None):
    """ Sort children along tag and given attribute.
    if attr is None, sort along all attributes"""
    if not isinstance(node.tag, str):  # PYTHON 2: use basestring instead
        # not a TAG, it is comment or DATA
        # no need to sort
        return
    # sort child along attr
    node[:] = sorted(node, key=lambda child: (headerFirst(child) + get_node_key(child, attr)))
    # and recurse
    for child in node:
        sort_children(child, attr)


def sort(unsorted_stream, sorted_stream, attr=None):
    """Sort unsorted xml file and save to sorted_file"""
    parser = etree.XMLParser(remove_blank_text=True)
    tree = etree.parse(unsorted_stream,parser=parser)
    root = tree.getroot()
    sort_children(root, attr)

    sorted_unicode = etree.tostring(tree, pretty_print=True, xml_declaration=True, encoding="UTF-8")

    sorted_stream.write('%s' % sorted_unicode)


#we could do this, 
#sort(sys.stdin, sys.stdout)
#but we want to check selection:

inputstr = ''
for line in sys.stdin:
  inputstr += line
if not inputstr:
   sys.stderr.write('no XML selected!')
   exit(100)

sort(io.BytesIO(inputstr), sys.stdout)

There are two tricky things:

    parser = etree.XMLParser(remove_blank_text=True)
    tree = etree.parse(unsorted_stream,parser=parser)

By default, the spaces are not ignored, which may produce a strange result.

    sorted_unicode = etree.tostring(tree, pretty_print=True, xml_declaration=True, encoding="UTF-8")

Again, by default there is no pretty-printing either.

I configure this tool to work on the current selection and replace the current selection because usually there are HTTP headers in the same file, YMMV.

$ python --version
Python 2.7.6

$ lsb_release -a
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.5 LTS
Release:    14.04
Codename:   trusty

If you do not need child node sorting, just comment the corresponding line out.

Links: here, here

UPDATE v2 places header in front of anything else; fixed spaces

UPDATE getting lxml on Ubuntu 18.04.3 LTS bionic:

sudo apt install python-pip
pip install --upgrade lxml
$ python --version
Python 2.7.15+
Lighterman answered 15/5, 2018 at 15:13 Comment(1)
good to mention, this worked for me as a pipe program on the server. thank you very much!Sargeant
D
1

in "bluefish" go to "Edit / Preferences / External filters" add new one called for example "Tidy XML" and put command "|tidy -xml -i|" then open any xml in "bluefish" and select from menu "Tools / Filters / Tidy XML" and it will format that opened xml file

prerequisities: install bluefish and tidy

Dionedionis answered 23/1, 2020 at 9:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.