How to resolve an internally-declared XML entity reference using NSXMLParser
Asked Answered
C

1

8

I have an XML file that uses internally-declared entities. For example:

<?xml version="1.0" encoding="UTF-8"?>

...

<!ENTITY my_symbol "my symbol value">

...

<my_element>
    <my_next_element>foo&my_symbol;bar</my_next_element>
</my_element>

...

Using the NSXMLParser class, how am I able to resolve the my_symbol entity reference?

From experimentation, the parser:foundInternalEntityDeclarationWithName:value: delegate method will be called for the my_symbol entity declaration, with value "my symbol value". Then, when the my_next_element element is reached, NSXMLParser will call the parser:didStartElement:namespaceURI:qualifiedName:attributes: delegate method.

Before parser:didEndElement:namespaceURI:qualifiedName: is called for </my_next_element>, the parser:foundCharacters: delegate method will be called twice with the strings:

  1. "foo"
  2. "bar"

The my_symbol entity reference is ignored. What is required in order for the entity reference to be resolved?

EDIT:

Removing the ENTITY declaration of my_symbol from the DTD will result in an NSXMLParserUndeclaredEntityError. This suggests that when the entity declaration is present, and then referenced in <my_next_element>, it is being noticed. For some reason it's just not being resolved to the string it represents.

Also, if &amp; is used within an element, the parser will correctly resolve it to "&" and this is passed as the string when the parser:foundCharacters: delegate method is called.

Crissman answered 9/10, 2009 at 0:27 Comment(5)
can you find it with XPath?Offhand
I performed an XPath query on 'my_element' and both "foo" and "bar" were not resolved either. Is it correct to reference internally-declared entities as I have done?Crissman
to be honest, I don't know if it's correct or not, but it is certainly unconventional. I'm not 100% clear on why you don't just make 2 child entities, "foo" and "bar". Then you could use XPath.Offhand
The XML above is an example I put together to test my problem. If "foo" and "bar" are separate child entities, XPath will find them. The problem is "&my_symbol;" not getting resolved. XPath can not find it.Crissman
I'm bumping to this problem now. Did you solve it?Hiro
G
3

I reviewed NSXMLParser.h which lists the following defined methods for delegates to support:

@interface NSObject (NSXMLParserDelegateEventAdditions)
// Document handling methods
- (void)parserDidStartDocument:(NSXMLParser *)parser;
    // sent when the parser begins parsing of the document.
- (void)parserDidEndDocument:(NSXMLParser *)parser;
    // sent when the parser has completed parsing. If this is encountered, the parse was successful.

// DTD handling methods for various declarations.
- (void)parser:(NSXMLParser *)parser foundNotationDeclarationWithName:(NSString *)name publicID:(NSString *)publicID systemID:(NSString *)systemID;

- (void)parser:(NSXMLParser *)parser foundUnparsedEntityDeclarationWithName:(NSString *)name publicID:(NSString *)publicID systemID:(NSString *)systemID notationName:(NSString *)notationName;

- (void)parser:(NSXMLParser *)parser foundAttributeDeclarationWithName:(NSString *)attributeName forElement:(NSString *)elementName type:(NSString *)type defaultValue:(NSString *)defaultValue;

- (void)parser:(NSXMLParser *)parser foundElementDeclarationWithName:(NSString *)elementName model:(NSString *)model;

- (void)parser:(NSXMLParser *)parser foundInternalEntityDeclarationWithName:(NSString *)name value:(NSString *)value;

- (void)parser:(NSXMLParser *)parser foundExternalEntityDeclarationWithName:(NSString *)name publicID:(NSString *)publicID systemID:(NSString *)systemID;

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict;
    // sent when the parser finds an element start tag.
    // In the case of the cvslog tag, the following is what the delegate receives:
    //   elementName == cvslog, namespaceURI == http://xml.apple.com/cvslog, qualifiedName == cvslog
    // In the case of the radar tag, the following is what's passed in:
    //    elementName == radar, namespaceURI == http://xml.apple.com/radar, qualifiedName == radar:radar
    // If namespace processing >isn't< on, the xmlns:radar="http://xml.apple.com/radar" is returned as an attribute pair, the elementName is 'radar:radar' and there is no qualifiedName.

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName;
    // sent when an end tag is encountered. The various parameters are supplied as above.

- (void)parser:(NSXMLParser *)parser didStartMappingPrefix:(NSString *)prefix toURI:(NSString *)namespaceURI;
    // sent when the parser first sees a namespace attribute.
    // In the case of the cvslog tag, before the didStartElement:, you'd get one of these with prefix == @"" and namespaceURI == @"http://xml.apple.com/cvslog" (i.e. the default namespace)
    // In the case of the radar:radar tag, before the didStartElement: you'd get one of these with prefix == @"radar" and namespaceURI == @"http://xml.apple.com/radar"

- (void)parser:(NSXMLParser *)parser didEndMappingPrefix:(NSString *)prefix;
    // sent when the namespace prefix in question goes out of scope.

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string;
    // This returns the string of the characters encountered thus far. You may not necessarily get the longest character run. The parser reserves the right to hand these to the delegate as potentially many calls in a row to -parser:foundCharacters:

- (void)parser:(NSXMLParser *)parser foundIgnorableWhitespace:(NSString *)whitespaceString;
    // The parser reports ignorable whitespace in the same way as characters it's found.

- (void)parser:(NSXMLParser *)parser foundProcessingInstructionWithTarget:(NSString *)target data:(NSString *)data;
    // The parser reports a processing instruction to you using this method. In the case above, target == @"xml-stylesheet" and data == @"type='text/css' href='cvslog.css'"

- (void)parser:(NSXMLParser *)parser foundComment:(NSString *)comment;
    // A comment (Text in a <!-- --> block) is reported to the delegate as a single string

- (void)parser:(NSXMLParser *)parser foundCDATA:(NSData *)CDATABlock;
    // this reports a CDATA block to the delegate as an NSData.

- (NSData *)parser:(NSXMLParser *)parser resolveExternalEntityName:(NSString *)name systemID:(NSString *)systemID;
    // this gives the delegate an opportunity to resolve an external entity itself and reply with the resulting data.

- (void)parser:(NSXMLParser *)parser parseErrorOccurred:(NSError *)parseError;
    // ...and this reports a fatal error to the delegate. The parser will stop parsing.

- (void)parser:(NSXMLParser *)parser validationErrorOccurred:(NSError *)validationError;
    // If validation is on, this will report a fatal validation error to the delegate. The parser will stop parsing.
@end

Based on the order of entries in the file it looks the found declaration methods are expected to occur before the elements are found (as you've discovered). I'd try handling all of these methods and see if any of them occur, but they all look like they are designed for other uses.

I wonder if there is a way to instrument all the unhandled messages sent to your delegate just in case the documentation/interface is incomplete.

Goethite answered 15/10, 2009 at 15:5 Comment(5)
I implemented all delegates, as per your suggestion, and re-ran. Still, no delegate is being called when the parser gets to "&my_symbol;". As mentioned, removing the ENTITY declaration will call the "resolveExternalEntityName" method which suggests it is recognising it as an entity reference. For some reason, when the ENTITY declaration is present (and recognised), it's just not resolving the reference to the entity's value.Crissman
Following up on my hope that it's just an undocumented method call, I found this page which talks about how to create a proxy object. Presumably you could throw this in front of your NSXMLParser delegate and look for anything that wasn't handled: borkware.com/rants/agentm/elegant-delegationGoethite
But even if it turns out to be undocumented message you need to handle it looks like you'll need to report a bug to Apple. Either it needs to be documented or it needs to be implemented and then documented.Goethite
I ran nm and strings on /Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS3.1.2.sdk/System/Library/Frameworks/Foundation.framework/Foundation and was unable to find anything on the delegate that was different from the above and I didn't see any new parse options (or settings) that were different from the documentation. The only thing I found that was that a similar libraries had two calls, one that processes entities and one that skips them. I'm sure that you already tried setting all the setShould* values to YES to no effect.Goethite
I implemented the suggestion of using a proxy object as the delegate in order to verify that no undocumented messages were being passed by NSXMLParser. Unfortunately, all expected delegate methods are being called. I will now report this as a bug to Apple and see what response I get.Crissman

© 2022 - 2024 — McMap. All rights reserved.