How to fix RapidXML String ownership concerns?
Asked Answered
B

1

8

RapidXML is a fast, lightweight C++ XML DOM Parser, but it has some quirks.

The worst of these to my mind is this:

3.2 Ownership Of Strings.

Nodes and attributes produced by RapidXml do not own their name and value strings. They merely hold the pointers to them. This means you have to be careful when setting these values manually, by using xml_base::name(const Ch *) or xml_base::value(const Ch *) functions.

Care must be taken to ensure that lifetime of the string passed is at least as long as lifetime of the node/attribute. The easiest way to achieve it is to allocate the string from memory_pool owned by the document. Use memory_pool::allocate_string() function for this purpose.

Now, I understand it's done this way for speed, but this feels like an car crash waiting to happen. The following code looks innocuous but 'name' and 'value' are out of scope when foo returns, so the doc is undefined.

void foo()
{
  char name[]="Name";
  char value[]="Value";

  doc.append_node(doc.allocate_node(node_element, name, value));
}

The suggestion of using allocate_string() as per manual works, but it's so easy to forget.

Has anyone 'enhanced' RapidXML to avoid this issue?

Building answered 12/3, 2010 at 11:58 Comment(5)
Wouldn't "enhancing" rapidxml be against the spirit of it? It's a bare bones super-fast parser and the lack of ownership is a key part of this.Lyndialyndon
OK, maybe a "wrapper" is a better term. But there's nothing inherently bad about having an additional "safer" interface... It would be down to users to choose speed vs. fragility.Building
In this case aren't "name" and "value" static on the heap and so have scope throughout the program.Anglophobia
@Mark - I don't think so - they are specifically arrays, not pointers.Building
Damn string ownership problem. Made me lost 30min. Thanks for the question though!Zaffer
G
1

I don't use RapidXML, but maybe my approach can solve your problem.

I started using Xerces, but I found it heavy, besides other minor annoyances, so I moved to CPPDOM. When I made the move I decided to create a set of wrapper classes, so that my code wouldn't be dependent from the specific XML 'engine' and I could port to another if needed.

I created my own classes to represent the basic DOM entities (node, document, etc). Those classes use internally the pimpl idiom to use CPPDOM objects. Since my node object contains the 'real' node object (from CPPDOM) I can manage anything as needed, so proper allocation and deallocation of strings wouldn't be a problem there.

Since my code is for CPPDOM, I don't think it would be much useful for you, but I can post it if you want.

BTW, if you already have too much code that already uses RapidXML you can reproduce its interfaces in your wrapper classes. I didn't do it because the code that used Xerces was not that long and I'd have to rewrite it anyway.

Grind answered 23/4, 2010 at 1:7 Comment(1)
Maybe not the answer I was looking for, but the best I'm going to get. Thanks!Building

© 2022 - 2024 — McMap. All rights reserved.