Making Xerces parse a string instead of a file
Asked Answered
H

4

19

I know how to create a complete dom from an xml file just using XercesDOMParser:

xercesc::XercesDOMParser parser = new xercesc::XercesDOMParser();
parser->parse(path_to_my_file);
parser->getDocument(); // From here on I can access all nodes and do whatever i want

Well, that works... but what if I'd want to parse a string? Something like

std::string myxml = "<root>...</root>";
xercesc::XercesDOMParser parser = new xercesc::XercesDOMParser();
parser->parse(myxml);
parser->getDocument(); // From here on I can access all nodes and do whatever i want

I'm using version 3. Looking inside the AbstractDOMParser I see that parse method and its overloaded versions, only parse files.

How can I parse from a string?

Hoick answered 14/1, 2011 at 12:29 Comment(0)
C
22

Create a MemBufInputSource and parse that:

xercesc::MemBufInputSource myxml_buf(myxml.c_str(), myxml.size(),
                                     "myxml (in memory)");
parser->parse(myxml_buf);
Clancy answered 14/1, 2011 at 12:38 Comment(8)
It's a "fake system id" that's used in error messages and "any entities which are referred to from this entity via relative paths/URLs will be relative to this fake system id". See API docs.Clancy
larsmans could you please tell me why, when using your code and correctly printing the xml, when I call Terminate() my app goes on Segmentation Fault?????Hoick
@Andry, I can't tell with just this info. Can you try copying the string with new char[] and setting the 4th (adoptBuffer) ctor argument to true? (see xerces.apache.org/xerces-c/apiDocs-3/…)Clancy
Well I discovered it... see here... absurd ahaha xerces.apache.org/xerces-c/faq-parse-2.html#faq-7Hoick
@Andry: I know, the Xerces rules for memory allocation are overly complicated. They seem never to have heard of RAII. Too bad.Clancy
@Hoick - sadly, the absurd ahaha link has aged into oblivion. :( Appears to have become: xerces.apache.org/xerces-c/faq-parse-3.html#faq-7Vasya
How did you solve the seg fault? I'm having the same problem.Brilliantine
Don't know if this is why any of above comments were seeing seg faults but MemBufInputSource doesn't seem to work unless you first initialize the system. More detail in answer below.Fennec
H
14

Use the following overload of XercesDOMParser::parse():

void XercesDOMParser::parse(const InputSource& source);

passing it a MemBufInputSource:

MemBufInputSource src((const XMLByte*)myxml.c_str(), myxml.length(), "dummy", false);
parser->parse(src);
Haar answered 14/1, 2011 at 12:40 Comment(3)
How can I figure out in what namespace MemBufInputSource and Wrapper4InputSource are in? I'm having serious trouble with namespaces in xerces. TyRevelry
It's on xercesc namespace, but you also need #include <xercesc/framework/MemBufInputSource.hpp> . I'm two years late, but I had the same issue and someone else can have it again later.Samaria
Also +1 for also specifying the necessary cast.Samaria
R
3

Im doing it another way. If this is incorrect, please tell me why. It seems to work. This is what parse expects:

DOMDocument* DOMLSParser::parse(const DOMLSInput * source )

So you need to put in a DOMLSInput instead of a an InputSource:

xercesc::DOMImplementation * impl = xercesc::DOMImplementation::getImplementation();
xercesc::DOMLSParser *parser = (xercesc::DOMImplementationLS*)impl)->createLSParser(xercesc::DOMImplementation::MODE_SYNCHRONOUS, 0);
xercesc::DOMDocument *doc;

xercesc::Wrapper4InputSource source (new xercesc::MemBufInputSource((const XMLByte *) (myxml.c_str()), myxml.size(), "A name");
parser->parse(&source);
Revelry answered 5/4, 2013 at 8:39 Comment(3)
Thanks for hinting this. This answer seems to be closer to the actual DOM Programming GuideThinia
I tried to use this but it seems to fail, when i tried to replace double quotes to single and add \n\ on each line, parsing seems ok. I did that based on MemParse.cpp sample in xerces. Do you know the problem with that?Dumpling
Hi, This is an answer from 7 year ago so I have no clue what I was doing at the time. Maybe if you can elaborate I can think with you. Where/what line did you change the quotes and add the \n?Revelry
F
0

You may use MemBufInputSource as found in the xercesc/framework/MemBufInputSource.cpp, and the header file, MemBufInputSource.hpp contains extensive documentation, as similar to answers above:

#include <xercesc/framework/MemBufInputSource.hpp>

char* myXMLBufString = "<root>hello xml</root>";
MemBufInputSource xmlBuf((const XMLByte*)myXMLBufString, 23, "myXMLBufName", false);

But take note, this doesn't seem to work unless you first initialize the system, as below (taken from the xerces-c-3.2.3/samples/src/SAX2Count/SAX2Count.cpp)

bool                         recognizeNEL = false;
char                         localeStr[64];
memset(localeStr, 0, sizeof localeStr);

// Initialize the XML4C2 system
try {
    if (strlen(localeStr)) {
        XMLPlatformUtils::Initialize(localeStr);
    } else {
        XMLPlatformUtils::Initialize();
    }
    if (recognizeNEL) {
        XMLPlatformUtils::recognizeNEL(recognizeNEL);
    }
} catch (const XMLException& toCatch) {
    XERCES_STD_QUALIFIER cerr << "Error during initialization! Message:\n"
            << StrX(toCatch.getMessage()) << XERCES_STD_QUALIFIER endl;
    return 1;
}

Of course reading a file wouldn't require thinking about this type of prep since you just pass a file path to the program which the parser takes. So for those experiencing seg faults, this could be the answer.

Fennec answered 5/4, 2022 at 23:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.