Number of characters in a string (not number of bytes)
Asked Answered
S

1

5

In a string in objective-c if I say something like [myStr length]; it returns a number which is the number of bytes but what can I use to return the number of characters in a string.

For example:
a string with the letter "a" in it returns a length of 1
a string with a single emoji in it returns a length of 2 (or even a length of 4 sometimes)


this is because that is the number of bytes in the string... I just need the number of characters.

Shaker answered 10/3, 2013 at 22:48 Comment(13)
@valentinas please re-wread my question. I don't think you grasped it. Thankyou for the link though! But that's not what I'm looking for.Shaker
«which is the number of bytes» No, as the documentation states, it's the number of Unicode characters.Lonnylonslesaunier
@JoshCaswell The documentation is wrong.Shaker
By "character" do you mean a Unicode scalar value or code-point?Omophagia
@MikeSamuel I'm not sure I know the difference but just that any unicode character from #0000 to #E01E0f to be counted as "1"Shaker
I suspect the problem is in how you created the strings. You need to supply the data AND the encoding. If you do that, then [str length] should work properly.Jez
@Jez This is all I've done. the string referenced by the %@ is just a UITextView the user types in. Where should I add the encoding to account for emoji unicode characters? userMessageCount = [[NSString stringWithFormat:@"%@", userMessageView.text] length];Shaker
currently the only options I see is to either use a massive for loop to convert everything from bytes to character-count and keep track of it for each new thing typed / deleted. Or to send the whole string into a UIWebView using loadHTML and then send some javascript in there to return the true string length. <--neither would be fun.Shaker
No, it's not. You seem to be looking for the number of glyphs in the rendering, the things that the user sees. Each glyph can be represented by several Unicode characters. See also: "Characters and Grapheme Clusters".Lonnylonslesaunier
@JoshCaswell Okay, that makes more sense +1 So how do I get the number of glyphs? I noticed the flag emojis are made up of two unicode characters (but I'm assuming that is an exception to finding glyph length because even Twitter considers the 1 flag unicode characters as 2 characters haha!)Shaker
I'm trying to figure that out. Actually, glyphs might be the wrong direction -- that's what you get when you render the string, but I'm not certain there's only one glyph when rendering, e.g., é.Lonnylonslesaunier
@JoshCaswell Hm. Interesting. I know doing it with javascript and an invisible UIWebView will work, but I have a feeling it will be very slow. I wonder if there's an easy way to see how Twitter's app does it. ?Shaker
@AlbertRenshaw, I didn't mean to suggest that they should be treated as two exclusive options. "character" often means octet, UTF-16 code-unit, Unicode scalar value. When input is messy, sometimes it means code-point instead of scalar value.Omophagia
S
9

I just whipped up this method. Add it to an NSString category.

- (NSUInteger)characterCount {
    NSUInteger cnt = 0;
    NSUInteger index = 0;
    while (index < self.length) {
        NSRange range = [self rangeOfComposedCharacterSequenceAtIndex:index];
        cnt++;
        index += range.length;
    }

    return cnt;
}

NSString *a = @"Hello";
NSLog(@"%@ length = %u, chars = %u", a, a.length, a.characterCount);
NSString *b = @"🏁 Emoji 📳";
NSLog(@"%@ length = %u, chars = %u", b, b.length, b.characterCount);

This yields:

Hello length = 5, chars = 5
🏁 Emoji 📳 length = 11, chars = 9

Shult answered 10/3, 2013 at 23:21 Comment(9)
Incredible! rmaddy, you always impress me! Haha! rangeOfComposedCharacterSequenceAtIndex VERY COOL!Shaker
(interesting note, try this with the flag emojis and it won't work. but that's because the flag emojis aren't actually unicode characters, they are two unicode characters that iOS and OSX render as 1 character when placed side by side.... 🇯🇵🇰🇷🇩🇪🇨🇳🇺🇸🇫🇷🇪🇸🇮🇹🇷🇺🇬🇧 ... For example the US flag is "🇺*🇸" without the asterisk, try copy and pasting "🇺*🇸" into a text area (like a comment on SO) and then backspacing the asterisk and see what happens :o)Shaker
Very interesting. Those flags are actually 4 Unicode symbols according to the Special Characters viewer on the Mac. My answer reports a single flag as having a character count of 2 while an NSString length of 4. I'll see If I can find a solution.Shult
Interesting, lucky for me since I'm building a twitter add-on app I am okay with having the flags count as 2 because even twitter counts them as 2 :)Shaker
My only problem right now is with the custom backspace on my keyboard, if I backspace by using a substringToIndex: I get MASSIVE glitches when backspacing emojis then trying to type again and ask emojis take 2 backspace to fully delete (flags take 4)Shaker
Dealing with the backspace should work if you make use of the rangeOfComposedCharacterSequenceAtIndex: method to get the length of the character. Obviously, there will still be an issue with these flag characters.Shult
Very nice! I had a suspicion that one of the rangeOfComposedCharacters... methods might be a key to a solution, but I got lost in typesetting documentation instead.Lonnylonslesaunier
This is cool. The US flag symbol is composed of the boxed U and the boxed S characters. The German flag symbol is from the boxed D and boxed E characters. Clever.Shult
I posted a question on the Apple iOS dev forums.Shult

© 2022 - 2024 — McMap. All rights reserved.