Processing a large xml file with perl
Asked Answered
P

2

6

I have an XML file which is about 200MB in size, i wish to extract selected information on a line by line bases.

I have written a script with perl using the module XML::LibXML to parse the file contents in and then loop the contents and extract the information line by line. This is ineffective as it reads in the whole file to memory, but I like LibXML as I can use the XPath locations of the information i require.

Can I get suggestions for ways to make my code more effective.

Through searching i have been made aware of XML::SAX and XML::LibXML::SAX but i cannot find documentation which explains the usage and they don't seem to include any type of XPath addressing structure.

Pimiento answered 15/2, 2011 at 16:30 Comment(1)
You might try XML::Twig (search.cpan.org/perldoc?XML%3A%3ATwig)Serra
X
15

Have you considered the XML::Twig module, which is much more efficient for large file processing, as it states in the CPAN module description:

NAME

XML::Twig - A perl module for processing huge XML documents in tree mode.

SYNOPSIS

...

It allows minimal resource (CPU and memory) usage by building the tree only for the parts of the documents that need actual processing, through the use of the twig_roots and twig_print_outside_roots options.

...

Xylophone answered 15/2, 2011 at 16:34 Comment(1)
Thank you for pointed me in this direction, So far my investigation is showing positive resultsPimiento
A
1

I had some luck with XML::Twig but ended up with XML::LibXML::Reader which is much faster... You may also check XML::LibXML::Pattern if you need to use XPath.

Accusatorial answered 25/8, 2014 at 2:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.