How to change XML to use empty-element tags?
Asked Answered
O

2

5

I am new to XML::Twig. How can I change all empty elements to use empty-element tags (<foo/>) instead of a start-tag and end-tag combo (<foo></foo>)?

Input:

<book>
    <given-names>Maurice<xref ref-type="fn" rid="fnI_1"></xref></given-names>
    <colspec colname="col1" colnum="1"></colspec>
    <entry align="left"><p></p></entry>
</book>

I need output as:

<book>
    <given-names>Maurice<xref ref-type="fn" rid="fnI_1"/></given-names>
    <colspec colname="col1" colnum="1"/>
    <entry align="left"><p/></entry>
</book>

I tried:

       use XML::Twig;
       my $xml = XML::Twig->new(twig_handlers => {
                                  'xref' => sub {$_->set_tag('#EMPTY'),},
                                },
                                pretty_print => 'indented',                                        
                               );
       $xml->parse('sample.xml');
       $xml->print;
}

But I can't process it. How can change gloabally without content tag to empty tag? how can I change?

Occasion answered 17/1, 2013 at 9:9 Comment(2)
<p></p> and <p/> are just different representations of the same data. Why does it matter which one you use?Blowy
I remove unwanted closing and decrease file size so many purpose.. for this using...Occasion
O
2

If you want to stick with Twig, you can do it like this:

#!usr/bin/perl
use strict;
use warnings;
use XML::Twig;

my $xml = XML::Twig->new(twig_handlers => {
             'p' => sub { 
                 if (!$_->first_child()) { $_->set_content('#EMPTY') } 
              },
           },
           pretty_print => 'indented',
           empty_tags => 'normal'                                 
);

$xml->parsefile('file.xml');
$xml->print;

Basically you have to manually check if the element contains nothing, then set it to be an empty element.

Oblivion answered 17/1, 2013 at 9:55 Comment(0)
S
5

XML::LibXML will automatically output the shorter version.

use XML::LibXML qw( );
print XML::LibXML->new()->parse_file($ARGV[0])->toString();

As for XML::Twig, it also uses the shorter form by default (empty_tags => 'normal'). However, it only considers empty elements those that were created from <foo/>. (Seems pretty stupid to me!) I did some digging and found that it does allow you change if it considers an element empty or not. This is done using set_empty and set_not_empty.

use XML::Twig qw( );
my $twig = XML::Twig->new(
   twig_handlers => {
      '*' => sub {
         $_->set_empty() if !$_->first_child();
      },
   },
);
$twig->parsefile($ARGV[0]);
$twig->print();
Sokul answered 17/1, 2013 at 9:41 Comment(8)
IIRC the reason the only elements considered empty are those created with an empty tag is to make it easier to flush the element at any time (including right after parsing the start tag). Since what the OP asks for is quite uncommon, and of not great interest XML-wise, XML::Twig doesn't support it "easily". It seems a lot more common for users to want to keep the output XML as close to the input as possible, which is what XML::Twig does by default.Linear
@mirod, Either you're done inspecting the element or your not. How it's output when you're done with it makes no difference whatsoever to when it can be flushed.Sokul
I have to go back to the code, but I remember trying to improve empty tag handling a few years ago, and never quite getting it right. It looked like it would be simple, but it never was. I may have an other go at it when I have a moment.Linear
@mirod, you can still track whether it's an "empty" tag but actually checking the number of children when outputting.Sokul
@mirod, Isn't 1 and 0 reversed in my $empty= defined $elt->{empty} ? $elt->{empty} : $elt->{first_child} ? 1 : 0;? (Maybe first_child doesn't mean what I think it means.)Sokul
@mirod, Would there be a problem with using my $empty = $elt->{empty} || !$elt->{first_child};? Or even my $empty = !$elt->{first_child};?Sokul
I can check, but I am not sure it's worth it.Linear
@mirod, It would make it empty_tags control the output of empty elements, instead of "empty" elements.Sokul
O
2

If you want to stick with Twig, you can do it like this:

#!usr/bin/perl
use strict;
use warnings;
use XML::Twig;

my $xml = XML::Twig->new(twig_handlers => {
             'p' => sub { 
                 if (!$_->first_child()) { $_->set_content('#EMPTY') } 
              },
           },
           pretty_print => 'indented',
           empty_tags => 'normal'                                 
);

$xml->parsefile('file.xml');
$xml->print;

Basically you have to manually check if the element contains nothing, then set it to be an empty element.

Oblivion answered 17/1, 2013 at 9:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.