Linux Bash XMLLINT with XPATH
Asked Answered
E

4

28

Today I get to learn how to use xmllint properly. It does not seem to be well covered or explained. I plan to use a single language resource file to run my entire system. I have a mixture of bash scripts and php pages that must read from this language file.

Currently I am using the following format in my xml file en.xml:

<?xml version="1.0" encoding="utf-8"?>
<resources>

   <item id="index.php">
        <label>LABEL</label>
        <value>VALUE</value>
        <description>DESCRIPTION</description>
   </item>
   <item id="config.php">
        <label>LABEL</label>
        <value>VALUE</value>
        <description>DESCRIPTION</description>
   </item>

</resources>

Now I need to start with a bash script line that should pull the data values from the xml file. For example I want to get the value of DESCRIPTION from the index.php item.

I was using

xmllint --xpath 'string(//description)' /path/en.xml

for a different layout which worked, but now that I am changing the layout of my xml file, I am lost as to how best to target a specific <item> and then drill down to its child element in the bash script.

Can someone help with a xmllint --xpath line to get this value please?

Eadie answered 3/11, 2014 at 6:41 Comment(0)
B
36

how best to target a specific and then drill down to its child element

The correct XPath expression to do this is:

/resources/item[@id="index.php"]/description/text()

In plain English: Start from the document node, to the document element resources, on to its child item, but only if the value of the id attribute is "index.php", on to its child description and retrieve its textual value.

I use xmllint to validate XML documents, but never for path expressions. In a bash shell (at least with Mac OS) there is an even simpler tool for evaluating XPath expressions, called "xpath":

$ xpath en.xml '/resources/item[@id="index.php"]/description/text()'

Then, the following result is obtained:

Found 1 nodes:
-- NODE --
DESCRIPTION

If you still prefer xmllint, use it in the following way:

$ xmllint --xpath '/resources/item[@id="index.php"]/description/text()' en.xml > result.txt

By default, --xpath implies --noout, which prevents xmllint from outputting the input XML file. To make the output more readable, I redirect the output to a file.

$ cat result.txt 
DESCRIPTION
Brierroot answered 3/11, 2014 at 7:41 Comment(4)
Hi, which version of xmllint are you using? I have xmllint --version xmllint: using libxml version 20626 compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug which doesn't have the --xpath optionMukluk
@ReddySK The results in my answer I obtained with xmllint: using libxml version 20902, but this actually does not report the version of xmllint itself, but the version of the underlying libraries (most importantly, libxml). It seems you have the problem described here: https://mcmap.net/q/504156/-xmllint-unknown-option-39-xpath-39/1987598, but trying to get --xpath to work in xmllint is not worth the trouble: xmllint is mainly a tool for validation, not for XPath expressions. Use an XPath library in the programming language of your choice instead.Routinize
xmllint --xpath using libxml version 20910 defaults to outputting the result now, so the command can be simplified to xmllint --xpath '/resources/item[@id="index.php"]/description/text()' en.xmlWhiffen
@Stephan The result of applying this XPath expression is always output. But xmllint usually also outputs the input document (en.xml) if it is well-formed and valid. I am redirecting to a file here because DESCRIPTION is output without a newline character, which makes it hard to read in some shells. I edited my answer to make it more clear.Routinize
I
3

If your xml doc uses namespaces, you're gonna have a bad time with xmllint.

For example, to pull all the paths out of a typical SMPTE asset map, naïvely running //Path/text() results in "XPath set is empty". You have to use the local-name() incantation:

xmllint --xpath '//*[local-name() = "Path"]/text()' ASSETMAP.xml

Result:

MER_SHR_C_EN-XX_US-NR_51_LTRT_UHD_ML7_SL4_20160915_OV_00.mxf

For xml:

<?xml version="1.0" encoding="UTF-8"?>
<AssetMap xmlns="http://www.smpte-ra.org/schemas/429-9/2007/AM">
  <Id>urn:uuid:f49af561-3b6c-439f-a3de-f7366f287c09</Id>
  <AnnotationText>MERIDIAN_SHR_C_EN-XX_US-NR_51_LTRT_UHD_ML7_SL4_20160915_OV</AnnotationText>
  <Creator>Studio Technologies</Creator>
  <VolumeCount>1</VolumeCount>
  <IssueDate>2016-09-15T10:10:50+00:00</IssueDate>
  <Issuer>NETFLIX</Issuer>
  <AssetList>
    <Asset>
      <Id>urn:uuid:66fae909-5690-4f9f-b43f-12d302cb7857</Id>
      <ChunkList>
        <Chunk>
          <Path>MER_SHR_C_EN-XX_US-NR_51_LTRT_UHD_ML7_SL4_20160915_OV_00.mxf</Path>
          <VolumeIndex>1</VolumeIndex>
          <Offset>0</Offset>
          <Length>106753518178</Length>
        </Chunk>
      </ChunkList>
    </Asset>
  </AssetList>
</AssetMap>
Influential answered 23/2, 2023 at 3:35 Comment(0)
B
2

My favorite is xmlstarlet because it seems to be more powerful than xmllint:

xmlstarlet sel -t -v '/resources/item[@id="index.php"]/description/text()' en.xml
Boiling answered 12/3, 2017 at 4:36 Comment(2)
xmlstarlet seems to be a powerful tool, thanks for the pointer!Zillah
The other command line secret weapon I have for these is 'xidel' because it supports xpath2.0 and xquery. The only weakness of 'xidel' is that it cannot read from stdin and thus it does not worked with Unix piping the way xmlstarlet does. Though xmlstarlet has lesser XML features, but it gets compensated because you can pipe it.Boiling
A
0

I had the same problem a few minutes ago and saw this post.

After hacking a bit I found the following solution to extract the city:

(
wget 'http://maps.googleapis.com/maps/api/geocode/xml?latlng=53.244921,-2.479539&sensor=true' \
  -O dummy.xml -o /dev/null
xmllint --format \
  --xpath '/GeocodeResponse/result[type = "postal_town"]/address_component[type = "postal_town"]/short_name/node()' \
  dummy.xml
)

You nee to specify the correct X-Path to get the desired XML-Tag and then return only the node value.

Arching answered 5/9, 2015 at 21:34 Comment(1)
There's no need for the temporary file - you can just pipe the output from WGet using wget -O- into xmllint -. You also have a few redundant options (WGet -o, xmllint --format). It should look like this: wget -O- <url> | xmllint --xpath <path> -. Also, for these kind of things I prefer curl over wget, mostly because it outputs to stdout by default, so you type less.Discoid

© 2022 - 2024 — McMap. All rights reserved.