XML formatting Indentation Tags Matching - Linux
Asked Answered
S

2

14

I have an XML file whose format is quite compressed and all tags are stick together like

<PersonalData><IndividualDetails><Title>Mr</Title><Gender>Male</Gender><FirstName>Hae</FirstName><Surname>JONES</Surname><Occupation>Banker</Occupation><DateofBirth>4/6/76</DateofBirth><LastKnownAddress></LastKnownAddress><LastKnownPostCode>00145</LastKnownPostCode><OtherNames></OtherNames></IndividualDetails><OccupationDetails><Company>SD Bank</Company><CompanyAddress>Sunset Boulevard NY</CompanyAddress><ContactNo>335698457</ContactNo></OccupationDetails></PersonalData>

Is there any command in shell that can properly format the tags. If not indentation only adding the tags to their own lines can also solve my problem.

Statvolt answered 15/8, 2012 at 7:10 Comment(0)
I
31
xmllint --format <your-xml-file>

example

$ cat test.xml
<a><b>c</b></a>
$ xmllint --format test.xml
<a>
  <b>c</b>
</a>
$ xmllint --format test.xml > test.formatted.xml
$ cat test.formatted.xml
<a>
  <b>c</b>
</a>
$
Illusive answered 15/8, 2012 at 7:14 Comment(6)
This is not working on the Actual file. The file size of my XML is approx 583 MB. The Format option works on a small size file but when i apply this on actual file the Bash Kills the operation. Any idea for formatting BIG files, in chunks or so...Statvolt
583 MB? Maybe you should implement your own SAX handler for indentation.Illusive
And how can i do that. Just guide me in thisStatvolt
One thing to bear in mind with the xmllint approach is that it won't preserver the exact original text. eg. <a></a> will be transformed into <a/>. Technically this is correct, but it may catch you out if you aren't expecting it.Marillin
Did you implement a solution for handling big files?Bautzen
At that size you might need to add the --stream option.Suck
A
10
tidy -xml -i -q

-xml - specify the input is well formed XML

-q - suppress nonessential output

-i - indent element content

tidy can work with files and stdin/stdout

echo '<a><b>c</b></a>' | tidy -xml -i -q

will produce

    <a>
      <b>c</b>
    </a>
Attalie answered 15/8, 2012 at 7:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.