Word Stemming in iOS - Not working for single word
Asked Answered
S

3

8

I am using NSLinguisticTagger for word stemming. I am able to get a stem words of words in a sentence, but not able to get a stem word for a single word.

Following is the code I am using,

    NSString *stmnt = @"i waited";
    NSLinguisticTaggerOptions options = NSLinguisticTaggerOmitWhitespace | NSLinguisticTaggerOmitPunctuation | NSLinguisticTaggerJoinNames;

    NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes:@[NSLinguisticTagSchemeLemma] options:options];
    tagger.string = stmnt;
    [tagger enumerateTagsInRange:NSMakeRange(0, [stmnt length]) scheme:NSLinguisticTagSchemeLemma options:options usingBlock:^(NSString *tag, NSRange tokenRange, NSRange sentenceRange, BOOL *stop) {
        NSString *token = [stmnt substringWithRange:tokenRange];
        NSLog(@"%@: %@", token, tag);
    }];

For this I am getting out correctly as:

i: i
waited: wait

But the above code fails to identify stem word if stmnt = @"waited";

Any help is greatly appreciated

Schrimsher answered 25/6, 2014 at 7:15 Comment(0)
S
5

Following code worked for me,

NSString *stmt = @"waited";
NSRange stringRange = NSMakeRange(0, stmt.length);
NSDictionary* languageMap = @{@"Latn" : @[@"en"]};
[stmt enumerateLinguisticTagsInRange:stringRange
                                       scheme:NSLinguisticTagSchemeLemma
                                      options:NSLinguisticTaggerOmitWhitespace
                                  orthography:[NSOrthography orthographyWithDominantScript:@"Latn" languageMap:languageMap]
                                   usingBlock:^(NSString *tag, NSRange tokenRange, NSRange sentenceRange, BOOL *stop) {
                                       // Log info to console for debugging purposes
                                       NSString *currentEntity = [stmt substringWithRange:tokenRange];
                                       NSLog(@"%@ is a %@, tokenRange (%d,%d)",currentEntity,tag,tokenRange.length,tokenRange.location);
                                   }];
Schrimsher answered 25/6, 2014 at 7:54 Comment(4)
It gives bad access if language does not contain that word. For example just try with 'abcd'. I am trying it using swift. Any idea?Crosier
Gives me bad access too. Not sure what I'm doing wrong.Hellish
#48769419 Any chance you can help here? Physical devices not working the same :\Sigler
Still gives bad access if the language does not contain that word. Swift 4.2 & iOS 12.0 :(.Headword
F
2

The accepted answer converted to Swift for those who need it:

    let stmt = "waited"
    let options: NSLinguisticTaggerOptions = .OmitWhitespace
    let stringRange = NSMakeRange(0, stmt.length)
    let languageMap = ["Latn":["en"]]
    let orthography = NSOrthography(dominantScript: "Latn", languageMap: languageMap)

    stmt.enumerateLinguisticTagsInRange(
        stringRange,
        scheme: NSLinguisticTagSchemeLemma,
        options: options,
        orthography: orthography)
        { (tag, tokenRange, sentenceRange, _) -> () in
            let currentEntity = stmt.substringWithRange(tokenRange)
            println(">\(currentEntity):\(tag)")
    }
Flintlock answered 18/8, 2015 at 23:5 Comment(3)
I got some NSRange not convertible to Range errors, so I just converted the string to NSString first ("let nsstmt : NSString = stmt as NSString") and ran everything using nsstmt. Not sure if there is a better way.Dibbell
I can confirm that stemming fails for a single word when using the String method, but works as expected (as least on the plurals I've tried) using the NSString equivalent. Bizarre! Also getting errors with the block based enumerateTags(in:scheme:options:using:) but the linguisticTags(in:) alternative is working as expected.Sordello
#48769419 Any chance you can help here? Physical devices not working the same :\Sigler
H
2

It doesn't work for single word, because there isn't enough information to determine its role in the sentence.

In our case, when user enters single word into our natural language parser, we assume it's a name of a thing, and thus a noun.

So we just construct a sentence where it's implied that the entered word is a noun like so:

let str = "please show me \(word)"

Then just run it through NSLinguisticTagger as usual.

Hellish answered 28/10, 2016 at 8:15 Comment(1)
#48769419 Any chance you can help here? Physical devices not working the same :\Sigler

© 2022 - 2024 — McMap. All rights reserved.