I am parsing large XML files (60GB+) with XML::Twig and using it in a OO (Moose) script. I am using the twig_handlers
option to parse elements as soon as they're read into memory. However, I'm not sure how I can deal with the Element and Twig.
Before I used Moose (and OO altogether), my script looked as follows (and worked):
my $twig = XML::Twig->new(
twig_handlers => {
$outer_tag => \&_process_tree,
}
);
$twig->parsefile($input_file);
sub _process_tree {
my ($fulltwig, $twig) = @_;
$twig->cut;
$fulltwig->purge;
# Do stuff with twig
}
And now I'd do it like this.
my $twig = XML::Twig->new(
twig_handlers => {
$self->outer_tag => sub {
$self->_process_tree($_);
}
}
);
$twig->parsefile($self->input_file);
sub _process_tree {
my ($self, $twig) = @_;
$twig->cut;
# Do stuff with twig
# But now the 'full twig' is not purged
}
The thing is that I now see that I am missing the purging of the fulltwig
. I figured that - in the first, non-OO version - purging would help on saving memory: getting rid of the fulltwig as soon as I can. However, when using OO (and having to rely on an explicit sub{}
inside the handler) I don't see how I can purge the full twig because the documentation says that
$_ is also set to the element, so it is easy to write inline handlers like
para => sub { $_->set_tag( 'p'); }
So they talk about the Element you want to process, but not the fulltwig itself. So how can I delete that if it is not passed to the subroutine?
@_
to see what was going on. Thanks! Is there any downside/upside of purging the full twig only after you have done stuff with the cut twig? My reasoning was to purge it immediately after cutting the element, so that memory is cleared as soon as possible. I might be wrong? Great module by the way, we use it all the time! – Shoemaker