Using Boost to read and write XML files
Asked Answered
G

15

72

Is there any good way (and a simple way too) using Boost to read and write XML files?

I can't seem to find any simple sample to read XML files using Boost. Can you point me a simple sample that uses Boost for reading and writing XML files?

If not Boost, is there any good and simple library to read and write XML files that you can recommend? (it must be a C++ library)

Girondist answered 25/6, 2009 at 9:23 Comment(1)
Try Boost.PropertyTree. You can find a short introduction to reading/writing XML files with it here.Iatry
L
65

You should Try pugixml Light-weight, simple and fast XML parser for C++

The nicest thing about pugixml is the XPath support, which TinyXML and RapidXML lack.

Quoting RapidXML's author "I would like to thank Arseny Kapoulkine for his work on pugixml, which was an inspiration for this project" and "5% - 30% faster than pugixml, the fastest XML parser I know of" He had tested against version 0.3 of pugixml, which has reached recently version 0.42.

Here is an excerpt from pugixml documentation:

The main features are:

  • low memory consumption and fragmentation (the win over pugxml is ~1.3 times, TinyXML - ~2.5 times, Xerces (DOM) - ~4.3 times 1). Exact numbers can be seen in Comparison with existing parsers section.
  • extremely high parsing speed (the win over pugxml is ~6 times, TinyXML - ~10 times, Xerces-DOM - ~17.6 times 1
  • extremely high parsing speed (well, I'm repeating myself, but it's so fast, that it outperforms Expat by 2.8 times on test XML) 2
  • more or less standard-conformant (it will parse any standard-compliant file correctly, with the exception of DTD related issues)
  • pretty much error-ignorant (it will not choke on something like You & Me, like expat will; it will parse files with data in wrong encoding; and so on)
  • clean interface (a heavily refactored pugxml's one)
  • more or less Unicode-aware (actually, it assumes UTF-8 encoding of the input data, though it will readily work with ANSI - no UTF-16 for now (see Future work), with helper conversion functions (UTF-8 <-> UTF-16/32 (whatever is the default for std::wstring & wchar_t))
  • fully standard compliant C++ code (approved by Comeau strict mode); the library is multiplatform (see reference for platforms list)
  • high flexibility. You can control many aspects of file parsing and DOM tree building via parsing options.

Okay, you might ask - what's the catch? Everything is so cute - it's small, fast, robust, clean solution for parsing XML. What is missing? Ok, we are fair developers - so here is a misfeature list:

  • memory consumption. It beats every DOM-based parser that I know of - but when SAX parser comes, there is no chance. You can't process a 2 Gb XML file with less than 4 Gb of memory - and do it fast. Though pugixml behaves better, than all other DOM-based parser, so if you're stuck with DOM, it's not a problem.
  • memory consumption. Ok, I'm repeating myself. Again. When other parsers will allow you to provide XML file in a constant storage (or even as a memory mapped area), pugixml will not. So you'll have to copy the entire data into a non-constant storage. Moreover, it should persist during the parser's lifetime (the reasons for that and more about lifetimes is written below). Again, if you're ok with DOM - it should not be a problem, because the overall memory consumption is less (well, though you'll need a contiguous chunk of memory, which can be a problem).
  • lack of validation, DTD processing, XML namespaces, proper handling of encoding. If you need those - go take MSXML or XercesC or anything like that.
Latea answered 19/9, 2009 at 16:15 Comment(5)
pugixml now has UTF-8, UTF-16, UTF-32 parsing.Prynne
@CristianAdam I can't figure out if it does support SAX parsing or it doesn't... I assume that it does since you say that it can't process a 2GiB XML file with less than 4GiB of memory.Midship
TinyXpath does add xpath support to TinyXMLConfucius
@fizzbuzz That's true, but according to the TinyXPath documentation, "under Linux, there's no library yet". There's just the command-line tool.Balanchine
pugixml is nice, but this does not answer the question about Boost?Silici
Q
25

TinyXML is probably a good choice. As for Boost:

There is the Property_Tree library in the Boost Repository. It has been accepted, but support seems to be lacking at the moment (EDIT: Property_Tree is now part of Boost since version 1.41, read the documentation regarding its XML functionality).

Daniel Nuffer has implemented an xml parser for Boost Spirit.

Quiff answered 25/6, 2009 at 11:47 Comment(1)
Also, use TinyXpath with TinyXMLConfucius
C
18

Boost uses RapidXML as described in chapter XML Parser of page How to Populate a Property Tree:

Unfortunately, there is no XML parser in Boost as of the time of this writing. The library therefore contains the fast and tiny RapidXML parser (currently in version 1.13) to provide XML parsing support. RapidXML does not fully support the XML standard; it is not capable of parsing DTDs and therefore cannot do full entity substitution.

Please also refer to the XML boost tutorial.

As the OP wants a "simple way to use boost to read and write xml files", I provide below a very basic example:

<main>
    <owner>Matt</owner>
    <cats>
        <cat>Scarface Max</cat>
        <cat>Moose</cat>
        <cat>Snowball</cat>
        <cat>Powerball</cat>
        <cat>Miss Pudge</cat>
        <cat>Needlenose</cat>
        <cat>Sweety Pie</cat>
        <cat>Peacey</cat>
        <cat>Funnyface</cat>
    </cats>
</main>

(cat names are from Matt Mahoney's homepage)

The corresponding structure in C++:

struct Catowner
{
    std::string           owner;
    std::set<std::string> cats;
};

read_xml() usage:

#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/xml_parser.hpp>

Catowner load(const std::string &file)
{
    boost::property_tree::ptree pt;
    read_xml(file, pt);

    Catowner co;

    co.owner = pt.get<std::string>("main.owner");

    BOOST_FOREACH(
       boost::property_tree::ptree::value_type &v,
       pt.get_child("main.cats"))
       co.cats.insert(v.second.data());

    return co;
}

write_xml() usage:

void save(const Catowner &co, const std::string &file)
{
   boost::property_tree::ptree pt;

   pt.put("main.owner", co.owner);

   BOOST_FOREACH(
      const std::string &name, co.cats)
      pt.add("main.cats.cat", name);

   write_xml(file, pt);
}
Cancroid answered 27/12, 2014 at 22:10 Comment(0)
T
16

There's also TinyXML, which is a nice and small C++ library. If you are looking for a lower-level library, RapidXML is a great starting point.

Tepefy answered 25/6, 2009 at 9:47 Comment(1)
Also, use TinyXpath with TinyXMLConfucius
P
5

It would appear that boost serialization can read from and write-to archives in XML, if that's sufficient for your purposes.

Easier XML with Boost

Phenyl answered 13/8, 2010 at 10:19 Comment(1)
Funnily enough I was just about to post this exact link after doing a search on XML & Boost.Gregorio
C
4

Well there is no specific library in boost for XML parsing, but there are lots of alternatives, here are a couple: libxml, Xerces, Expat

Of course you could use some of the other libraries in boost to aid you in making your own library, but that will probably be quite an undertaking.

And here is a whole article on the subject by IBM.

Concerto answered 25/6, 2009 at 9:31 Comment(0)
A
4

Boost does not provide an XML parser atm.

Poco XML (part of the Poco C++ libs) is good and simple.

Apuleius answered 25/6, 2009 at 10:59 Comment(1)
I cannot comment on the quality of the Poco C++ libraries, but stylistically it's very different to Boost. For someone wanting to interoperate well with other Boost components and the STL, it may not be a good match. I'm not referring to the naming conventions (though they may grate); rather the heavy use of the inheritance, virtual functions, and the lack of templating on a character type. These design decisions may or may not be for the better; but they're certainly quite different to those of Boost and the STL.Milkwhite
H
3

Definatelly use TinyXML *thumbs up*

Hsu answered 25/6, 2009 at 9:51 Comment(0)
M
3

If you are looking for DOM functionality only, there are some suggestions already in this thread. I personally would probably not bother with a library lacking XPath support, and in C++, would use Qt. There's also TinyXPath, and Arabica claims to have XPath support, but I cannot say anything at all about those.

Mord answered 25/6, 2009 at 15:44 Comment(0)
S
2

From my experiences lurking on the Boost mailing list, it appears that every time XML comes up as a subject, it is diverted into a discussion about Unicode. However, since there is a potential Unicode library looming right now, I don't think it will take too long for an XML library to appear there.

In the meantime, I too have been using TinyXML.

Interesting link about RapidXML. I'll take a look at that.

Spark answered 25/6, 2009 at 9:54 Comment(0)
M
2

Take a look at Arabica

Mcloughlin answered 25/6, 2009 at 13:26 Comment(0)
V
2

A warning. I love RapidXML, but it has a very nasty bug when parsing UTF16. Some valid values cause it to crash.

I would love to recommend pugixml - but it lacks namespace support, which I know is going to cause me problems.

Valise answered 4/1, 2011 at 16:46 Comment(1)
Hi, i already have tried the pugixml, and the biggest problem from that library (for me of course) is the lack of some tools to translate schema to C++, so i get back to the "heavy" xerces :) and use with this codesynthesis.com/projects/xsd that is prety interestingGirondist
S
2

There is a GSoC proposed work to improve the existing proposal of Boost.XML : https://github.com/stefanseefeld/boost.xml but as Andrzej proposed Boost.PropertyTree is nice for this task. Depending naturally of the xml size and the validation support needed.

There is also a library which was recently proposed on the Boost Mailing List : http://www.codesynthesis.com/projects/libstudxml/doc/intro.xhtml

Solothurn answered 1/4, 2014 at 9:15 Comment(0)
J
1
<?xml version="1.0"?>
<Settings>
  <GroupA>
      <One>4</One>
      <Two>7</Two>
      <Three>9</Three> 
  </GroupA>
  <GroupA>
      <One>454</One>
      <Two>47</Two>
      <Three>29</Three> 
  </GroupA>
  <GroupB>
      <A>String A</A>
      <B>String B</B>  
  </GroupB>  
</Settings>

There is an easy way to read XML with BOOST. This example is with std::wstring:

#include <string> 
#include <boost/property_tree/xml_parser.hpp>
#include <boost/property_tree/ptree.hpp>
#include <boost/foreach.hpp>

bool CMyClass::ReadXML(std::wstring &full_path)
{
    using boost::property_tree::wptree;

    // Populate tree structure pt:
    wptree pt;
    std::wstringstream ss; ss << load_text_file(full_path); // See below for ref.
    read_xml(ss, pt);

    // Traverse pt:
    BOOST_FOREACH(wptree::value_type const& v, pt.get_child(L"Settings"))
    {
        if (v.first == L"GroupA")
        {
            unsigned int n1 = v.second.get<unsigned int>(L"One");
            unsigned int n2 = v.second.get<unsigned int>(L"Two");
            unsigned int n3= v.second.get<unsigned int>(L"Three");
        }
        else if (v.first == L"GroupB")
        {
            std::wstring wstrA = v.second.get<std::wstring>(L"A");
            std::wstring wstrB = v.second.get<std::wstring>(L"B");
        }
    };
}

To read attributes is just a little bit more complicated.

-

Just for the reference:

std::wstring load_text_file(std::wstring &full_path)
{
    std::wifstream wif(full_path);

    wif.seekg(0, std::ios::end);
    buffer.resize(wif.tellg());
    wif.seekg(0);
    wif.read(buffer.data(), buffer.size());

    return buffer;
}
Johanna answered 5/7, 2018 at 11:3 Comment(0)
E
0

What about boost.spirit?

Here, they show a "Mini XML" parser

Elston answered 21/4, 2012 at 15:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.