Convert Special Characters for RTF
Asked Answered
P

2

2

Can someone please assist me with converting special characters to something that can be correctly represented in an RTF file?

I am taking text stored in a string on the iPad and outputting it as an RTF file using NSASCIIStringEncoding. So far so good. What I've neglected to do successfully, is take into account special characters (e.g. tilde, umlaut, accent, etc.) . Sorry RoW!

The most universal RTF format seems to want 8-bit text encoding with code page escape (two hexadecimal digits following a backslash). So n with tilde (ñ) would be \'f1.

The only solution that occurs to me is to convert to NSUTF8StringEncoding and then use stringByReplacingOccurrencesOfString, but there are a lot characters and it seems tedious to have to replace every one of them manually. Is there a more efficient way that is escaping me? (pun intended) :)

Thanks for any suggestions.

Pipage answered 17/9, 2010 at 18:11 Comment(3)
Just a thought from similar problems in web development... use Unicode throughout?Multifid
I can't help you with the specifics of the RTF format, but it's worth pointing out that as of iOS 4.0 Apple have (finally) introduced regular expression support - see NSRegularExpression. Depending on whether you need to support legacy 3.x devices or not, I'd be tempted to use regexes to solve this particular problem, since it's precisely what they were designed for.Clammy
Thanks for sharing that. It's an iPad app, so for the time being I have to support v3.2.Pipage
K
2

Check the value of characterAtIndex: if it is > 127, it is not ASCII, so escape the character.

Something like the following

- (NSString *)stringFormattedRTF:(NSString *)inputString
{
    NSMutableString *result = [NSMutableString string];

    for ( int index = 0; index < [inputString length]; index++ ) {
        NSString *temp = [inputString substringWithRange:NSMakeRange( index, 1 )];
        unichar tempchar = [inputString characterAtIndex:index];

        if ( tempchar > 127) {
            [result appendFormat:@"\\\'%02x", tempchar]; 
        } else {
            [result appendString:temp];
        }
    }
    return result;
}
Kerri answered 17/9, 2010 at 19:56 Comment(4)
This is not working as expected. Will update when a working solution is foundKerri
Thanks for the proposed answer and follow-up.Pipage
That got it. Thanks very much falconcreek!Pipage
The above is working great for "special characters" like umlauts and accents. Any for suggestions on how to handle two-byte characters, such as Japanese and Chinese? It seems to me that the above should already be escaping those, but right now those characters are being converted to ????. Thanks.Pipage
R
5

@falconcreek's answer saved me lots of time writing code to coping with a wider range of cases, including, say, Chinese characters (as requested by DenVog). In particular, it's important to check for: "\", "{" and "}" as these are used by the RTF format. (See How to output unicode string to RTF (using C#), for example.) The following category on NSString copes with a string such as:

The quick \ slow {brown} fox “slurped” lazily on his π-latté, while Faye Wong (王菲) played in the background.

@interface NSString (TR)    
- (NSString *)stringFormattedRTF;
@end

@implementation NSString (TR)

#define backslash 0x5C
#define openCurlyBrace 0x7B
#define closeCurlyBrace 0x7D

- (NSString *)stringFormattedRTF;
{
    NSMutableString *result = [NSMutableString string];

    for (int index = 0; index < [self length]; index++)
    {
        unichar unicodeCharacter = [self characterAtIndex: index];

        if (unicodeCharacter == backslash || unicodeCharacter == openCurlyBrace || unicodeCharacter == closeCurlyBrace)
        {
            [result appendFormat: @"\\%c", unicodeCharacter];

        }
        else if (unicodeCharacter > 127)
        {
            [result appendFormat:@"\\uc0\\u%u ", unicodeCharacter];
        }
        else
        {
            [result appendFormat:@"%c", unicodeCharacter];
        }
    }
    return result;
}

Side note: Microsoft provide 1.9.1 RTF spec, which is really helpful if you want to output RTF. Wikipedia says (as of May 2012) this the most recent version. Google tends to kick up a much older RTF specs.

Ripieno answered 8/5, 2012 at 22:27 Comment(1)
Thanks for sharing this. The braces came back to bite me, and this helped me sort them out. Thanks for following up on the thread.Pipage
K
2

Check the value of characterAtIndex: if it is > 127, it is not ASCII, so escape the character.

Something like the following

- (NSString *)stringFormattedRTF:(NSString *)inputString
{
    NSMutableString *result = [NSMutableString string];

    for ( int index = 0; index < [inputString length]; index++ ) {
        NSString *temp = [inputString substringWithRange:NSMakeRange( index, 1 )];
        unichar tempchar = [inputString characterAtIndex:index];

        if ( tempchar > 127) {
            [result appendFormat:@"\\\'%02x", tempchar]; 
        } else {
            [result appendString:temp];
        }
    }
    return result;
}
Kerri answered 17/9, 2010 at 19:56 Comment(4)
This is not working as expected. Will update when a working solution is foundKerri
Thanks for the proposed answer and follow-up.Pipage
That got it. Thanks very much falconcreek!Pipage
The above is working great for "special characters" like umlauts and accents. Any for suggestions on how to handle two-byte characters, such as Japanese and Chinese? It seems to me that the above should already be escaping those, but right now those characters are being converted to ????. Thanks.Pipage

© 2022 - 2024 — McMap. All rights reserved.