iOS - Most efficient way to find word occurrence count in a string
Asked Answered
F

3

12

Given a string, I need to obtain a count of each word that appears in that string. To do so, I extracted the string into an array, by word, and searched that way, but I have the feeling that searching the string directly is more optimal. Below is the code that I originally wrote to solve the problem. I'm up for suggestions on better solutions though.

NSMutableDictionary *sets = [[NSMutableDictionary alloc] init];

NSString *paragraph = [[NSString alloc] initWithContentsOfFile:[[NSBundle mainBundle] pathForResource:@"text" ofType:@"txt"] encoding:NSUTF8StringEncoding error:NULL];

NSMutableArray *words = [[[paragraph lowercaseString] componentsSeparatedByString:@" "] mutableCopy];

while (words.count) {
    NSMutableIndexSet *indexSet = [[NSMutableIndexSet alloc] init];
    NSString *search = [words objectAtIndex:0];
    for (unsigned i = 0; i < words.count; i++) {
        if ([[words objectAtIndex:i] isEqualToString:search]) {
            [indexSet addIndex:i];
        }
    }
    [sets setObject:[NSNumber numberWithInt:indexSet.count] forKey:search];
    [words removeObjectsAtIndexes:indexSet];
}

NSLog(@"%@", sets);

Example:

Starting string:
"This is a test. This is only a test."

Results:

  • "This" - 2
  • "is" - 2
  • "a" - 2
  • "test" - 2
  • "only" - 1
Feathercut answered 13/11, 2012 at 18:8 Comment(1)
Does your method work? Don't you have problems with the dot symbol? I guess you should have "test." rather than "test".Diba
F
24

This is exactly what an NSCountedSet is for.

You need to break the string apart into words (which iOS is nice enough to give us a function for so that we don't have to worry about punctuation) and just add each of them to the counted set, which keeps track of the number of times each object appears in the set:

NSString     *string     = @"This is a test. This is only a test.";
NSCountedSet *countedSet = [NSCountedSet new];

[string enumerateSubstringsInRange:NSMakeRange(0, [string length])
                           options:NSStringEnumerationByWords | NSStringEnumerationLocalized
                        usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){

                            // This block is called once for each word in the string.
                            [countedSet addObject:substring];

                            // If you want to ignore case, so that "this" and "This" 
                            // are counted the same, use this line instead to convert
                            // each word to lowercase first:
                            // [countedSet addObject:[substring lowercaseString]];
                        }];

NSLog(@"%@", countedSet);

// Results:  2012-11-13 14:01:10.567 Testing App[35767:fb03] 
// <NSCountedSet: 0x885df70> (a [2], only [1], test [2], This [2], is [2])
Fugate answered 13/11, 2012 at 19:5 Comment(2)
OMG! That's even easier than I could have imagined! Thank you!Feathercut
@Inafzinger But there is slight problem in this , when i want to count number of words in NSString that contain some html tags, its skip them. But i want to count those as well. Any idea for this.Yasmeen
M
2

If I had to guess, I would say NSRegularExpression for that. Like this:

NSUInteger numberOfMatches = [regex numberOfMatchesInString:string
                                                    options:0
                                                      range:NSMakeRange(0, [string length])];

That snippet was taken from here.


Edit 1.0:

Based on what Sir Till said:

NSString *string = @"This is a test, so it is a test";

NSMutableDictionary *dictionary = [NSMutableDictionary dictionary];
NSArray *arrayOfWords = [string componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
for (NSString *word in arrayOfWords)
{
    if ([dictionary objectForKey:word])
    {
        NSNumber *numberOfOccurences = [dictionary objectForKey:word];
        NSNumber *increment = [NSNumber numberWithInt:(1 + [numberOfOccurences intValue])];
        [dictionary setValue:increment forKey:word];
    }
    else
    {
        [dictionary setValue:[NSNumber numberWithInt:1] forKey:word];
    }
}

You should be careful with:

  • Punctuation signs. (near other words)
  • UpperCase words vs lowerCase words.
Minnick answered 13/11, 2012 at 18:23 Comment(1)
I need a count of each word. So in the string, "This is a test. This is only a test." "test" would have a count of 2, "this" would have a count of 2, "only" would have a count of 1, etc.Feathercut
H
1

I think that's really bad idea that you trying to search a words among the long paragraph with a loop. You should use a regular expression to do that! I know it's not easy at first time to learn it but it's really worth to know it! Take look at this case Use regular expression to find/replace substring in NSString

Herwick answered 13/11, 2012 at 18:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.