How to list XML node attributes with XML::LibXML?

Asked 7/11, 2014 at 7:34 Answered 7/11, 2014 at 11:8

Given the following XML snippet:

<outline>
  <node1 attribute1="value1" attribute2="value2">
    text1
  </node1>
</outline>

How do I get this output?

outline
node1=text1
node1 attribute1=value1
node1 attribute2=value2

I have looked into use XML::LibXML::Reader;, but that module appears to only provide access to attribute values referenced by their names. And how do I get the list of attribute names in the first place?

Sharpshooter answered 7/11, 2014 at 7:34 Comment(0)

You find the list of attributes by doing $e->findnodes( "./@*");

Below is a solution, with plain XML::LibXML, not XML::LibXML::Reader, that works with your test data. It may be sensitive to extra whitespace and mixed-content though, so test it on real data before using it.

#!/usr/bin/perl

use strict;
use warnings;

use XML::LibXML;

my $dom= XML::LibXML->load_xml( IO => \*DATA);
my $e= $dom->findnodes( "//*");

foreach my $e (@$e)
  { print $e->nodeName;

    # text needs to be trimmed or line returns show up in the output
    my $text= $e->textContent;
    $text=~s{^\s*}{};
    $text=~s{\s*$}{};

    if( ! $e->getChildrenByTagName( '*') && $text)
      { print "=$text"; }
    print "\n"; 

    my @attrs= $e->findnodes( "./@*");
    # or, as suggested by Borodin below, $e->attributes

    foreach my $attr (@attrs)
      { print $e->nodeName, " ", $attr->nodeName. "=", $attr->value, "\n"; }
  }
__END__
<outline>
  <node1 attribute1="value1" attribute2="value2">
    text1
  </node1>
</outline>

Ancient answered 7/11, 2014 at 8:10 Comment(7)

There are much cleaner ways to fetch the attributes. The obvious is my @attrs = $e->attributes, which returns a list of all attribute nodes, but an element node object also behaves as a tied hash reference, and keys %$e will return all of the attribute names while $e->{attr_name} will return the value of attribute attr_name. – Retention 7/11, 2014 at 9:43

thanks, I didn't find this in the docs, which I thought was strange. And now I see it, under "Overloading", duh! I still don't see attributes though, at least in the docs for XML::LibXML::Element – Ancient 7/11, 2014 at 11:3

I see, I wasn't expecting to find it there. Actually it makes no sense at all. I see that it is also used to return the list of namespace declarations associated with the node, WTF? Why 1 method for 2 extremely different results? I can't even find it in the DOM spec... Boy I'm glad I use XML::Twig ;--) – Ancient 7/11, 2014 at 11:21

The border between XML::LibXML::Element and XML::LibXML::Node is a little strange. I would expect all attribute stuff to appear in the former as no other node type can have attributes. But the namespace declarations is kinda okay: a namespace looks just like an attribute called xmlns. – Retention 7/11, 2014 at 11:32

agreed, indeed with findnodes( "./@*") (ir using %$e) you don't get the namespace declarations, while attributes gives them to you. And before testing, I thought that attributes would return a list of all namespace declarations that applied to a node, not just the ones declared in the start tag of the element. – Ancient 7/11, 2014 at 12:30

It has been on my list of things to do -- towards the bottom, in the section marked "interesting" -- to examine and understand the libxml2 library on which this is based: exercises like that always enhance my understanding of related software. I hope to find that strangenesses like this one in the Perl glue library are mainly due to our vision being forced through the fat lenses of the author's spectacles. – Retention 7/11, 2014 at 16:57

Thank you very much! I like both solutions: Borodin's for the use of attributes and mirod's for unifying approach to nodes walking with findnodes( "//*"). (Sorry, my question was badly composed, the <outline> is basically an ordinary node, just like <node1>, so what I really needed was a recursive walk over the whole document.) You've done a good job at clarifying the Perl docs too ;) – Sharpshooter 8/11, 2014 at 0:28

Something like this should help you.

It's not clear from your question whether <outline> is the root element of the data, or if it is buried somewhere in a bigger document. It's also unclear how general you want the solution to be - e.g. do you want the entire document dumped in this manner?

Anyway, this program generates the output you requested from the given XML input in a fairly concise manner.

use strict;
use warnings;
use 5.014;     #' For /r non-destructive substitution mode

use XML::LibXML;

my $xml = XML::LibXML->load_xml(IO => \*DATA);

my ($node) = $xml->findnodes('//outline');

print $node->nodeName, "\n";

for my $child ($node->getChildrenByTagName('*')) {
  my $name = $child->nodeName;

  printf "%s=%s\n", $name, $child->textContent =~ s/\A\s+|\s+\z//gr;

  for my $attr ($child->attributes) {
    printf "%s %s=%s\n", $name, $attr->getName, $attr->getValue;
  }
}

__DATA__
<outline>
  <node1 attribute1="value1" attribute2="value2">
    text1
  </node1>
</outline>

output

outline
node1=text1
node1 attribute1=value1
node1 attribute2=value2

Retention answered 7/11, 2014 at 11:8 Comment(0)

You find the list of attributes by doing $e->findnodes( "./@*");

#!/usr/bin/perl

use strict;
use warnings;

use XML::LibXML;

my $dom= XML::LibXML->load_xml( IO => \*DATA);
my $e= $dom->findnodes( "//*");

foreach my $e (@$e)
  { print $e->nodeName;

    # text needs to be trimmed or line returns show up in the output
    my $text= $e->textContent;
    $text=~s{^\s*}{};
    $text=~s{\s*$}{};

    if( ! $e->getChildrenByTagName( '*') && $text)
      { print "=$text"; }
    print "\n"; 

    my @attrs= $e->findnodes( "./@*");
    # or, as suggested by Borodin below, $e->attributes

    foreach my $attr (@attrs)
      { print $e->nodeName, " ", $attr->nodeName. "=", $attr->value, "\n"; }
  }
__END__
<outline>
  <node1 attribute1="value1" attribute2="value2">
    text1
  </node1>
</outline>

Ancient answered 7/11, 2014 at 8:10 Comment(7)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags