NSXMLParser can't parse special characters (accents)
Asked Answered
L

3

10

I'm using NSXMLParser to parse an xml from a url (my code is almost exactly the same as here)

Some of the elements contain special characters like "á" which causes a word lik ándre to split into two (á and ndre).

Here is my loadXMLByURL

-(id) loadXMLByURL:(NSString *)urlString{
tickets     = [[NSMutableArray alloc] init];
NSURL *url      = [NSURL URLWithString:urlString];
NSData  *data   = [[NSData alloc] initWithContentsOfURL:url];
parser          = [[NSXMLParser alloc] initWithData:data];
parser.delegate = self;
[parser parse];
return self;}

I'm pretty sure it's because the encoding is not set (I think it needs to be NSUTF8StringEncoding) but I'm not sure where/how to apply it.

[UPDATE] Rest of my code...

- (void) parser:(NSXMLParser *)parser didStartElement:(NSString *)elementname namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict
{    
    if ([elementname isEqualToString:@"ticket"]) 
    {
        currentTicket = [Ticket alloc];
    }

}

- (void) parser:(NSXMLParser *)parser didEndElement:(NSString *)elementname namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
{
    if ([elementname isEqualToString:@"name"]) 
    {
        currentTicket.name = currentNodeContent;
    }
    else if ([elementname isEqualToString:@"title"]) 
    {
        currentTicket.title = currentNodeContent;
    }
    else if ([elementname isEqualToString:@"status"]) 
    {
        currentTicket.status = currentNodeContent;
    }
    else if ([elementname isEqualToString:@"ticket"])
    {
        [tickets addObject:currentTicket];
        [currentTicket release];
        currentTicket = nil;
        [currentNodeContent release];
        currentNodeContent = nil;
    }

}

- (void) parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
    currentNodeContent = (NSMutableString *) [string stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
}

[UPDATE 2] Sample xml...

<RB>
    <list>
        <ticket>
            <name>Andrew Ford</name>
            <title>3rd release</title>
            <status>1</status>
        </ticket>

        <ticket>
            <name>David Jenkins</name>
            <title>3rd release</title>
            <status>0</status>
        </ticket>

        <ticket>
            <name>Luis gomez ándre</name>
            <title>3rd release</title>
            <status>1</status>
        </ticket>
    </list>
</RB>
Lattermost answered 18/4, 2012 at 18:2 Comment(3)
I'm brand new to Obj-C, but it would seem to me that you could convert data to an NSString, encode that as UTF8 as shown here and pass that to NSXMLParser?Wrinkle
I use NSXMLParser a lot and haven't had that issue. Can you post the portion of the code that processes the XML text?Homogenous
Posted the rest of my code. I wonder if the issue now lies in foundCharacters?Lattermost
Z
6

I would load the url to an NSString and then convert like this.

-(id) loadXMLByURL:(NSString *)urlString{

    tickets     = [[NSMutableArray alloc] init];
    NSURL *url      = [NSURL URLWithString:urlString];
    NSError *error;
    NSString * dataString = [[NSString alloc] initWithContentsOfURL:url encoding:NSUTF8StringEncoding error:&error];
    NSData *data = [dataString dataUsingEncoding:NSUTF8StringEncoding];
    parser          = [[NSXMLParser alloc] initWithData:data];
    parser.delegate = self;
    [parser parse];
    return self;

}

EDIT: Part of the problem may be that your parser:foundCharacters: method is assigning to your currentNodeContent instead of appending. See the Apple Doc at the following link.

http://developer.apple.com/library/ios/#documentation/cocoa/reference/NSXMLParserDelegate_Protocol/Reference/Reference.html

From the doc:

Because string may be only part of the total character content for the current element, you should append it to the current accumulation of characters until the element changes.

Zendavesta answered 21/4, 2012 at 0:22 Comment(7)
Thank you. That didn't quite solve my problem (even though I thought it would)! Doing an NSLog(@"%@", dataString); shows that the string does indeed include characters with accents etc which is great. However a word like ándre is still breaking. That makes me think the issue is in foundcharacters(). See my updated question.Lattermost
Is the work broken in the log you mentioned or only after going thought the parser?Zendavesta
In the log the strings are displayed correctly. It's when adding the object to the tickets array, a name with a special character will be registered as two separate objects.Lattermost
Perhaps you could provide an example of the actual xml being parsed.Zendavesta
Updated original question with sample xml. Thanks.Lattermost
So, you say that you get two separate objects. Does that mean that you get two separate items in the tickets array? If so, what actually gets contained in these two objects, in addition to the name property?Zendavesta
@AndrewDavis do you find a solution to this issue?, I'm running into the same problem where the string splits with the accent. if you found way to make it work can your share it?Kanarese
F
4

Found the problem! It is indeed in found characters. You should change your code to this:

- (void) parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
    NSLog(@"found characters: %@", string);
    if (!currentNodeContent) {
        currentNodeContent = [[NSMutableString alloc] init];
    }
    [currentNodeContent appendString:string];
}

I was having the same problem before, and the above code has fixed it.

Funiculate answered 16/12, 2014 at 0:41 Comment(3)
What does if (!currentNodeContent) condition implies?Charmian
Does is check if the string is blank or not?Charmian
@Charmian It checks for nil. It's equivalent to if (currentNodeContent == nil) {Zendavesta
B
1

Use

NSData *data = [dataString dataUsingEncoding:NSUTF8StringEncoding];

and to get the string from it, do this:

NSString *theXML = [[NSString alloc] initWithBytes:[data mutableBytes]
                                                    length:[data length]
                                                 encoding:NSUTF8StringEncoding];

Then you can parse the xml in your NSXMLParserDelegate methods.

Hope this helps.

Bielefeld answered 25/4, 2012 at 20:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.