Detect Language of NSString
Asked Answered
D

6

42

Somebody told me about a class for language recognition in Cocoa. Does anybody know which one it is?

This is not working:

NSSpellChecker *spellChecker = [NSSpellChecker sharedSpellChecker];
[spellChecker setAutomaticallyIdentifiesLanguages:YES];
NSString *spellCheckText = @"Guten Tag Herr Mustermann. Dies ist ein deutscher Text. Bitte löschen Sie diesen nicht.";
[spellChecker checkSpellingOfString:spellCheckText startingAt:0];
NSLog(@"%@", [spellChecker language]);

The result is 'en' but should be 'de'.

Dollfuss answered 12/6, 2011 at 22:19 Comment(4)
My German is not very good, but shouldn't it be "einen deutscher Text"? Excuse me if I'm wrong though.Emblements
+1 great question! with the amount of time I spend working with the frameworks, I'm surprised I've never heard of language detection like this before.Interatomic
@WTP: Nope, it's "ein". You'd say "Ich lese einen deutschen Text" though. ;) German is a nasty bitch. Full of entropy. ;) #nativeHydrokinetic
This question/answer could be help.Tundra
D
14

Thats the result:

- (NSString *)languageForString:(NSString *) text{

     if (text.length < 100) {
         return (NSString *) CFStringTokenizerCopyBestStringLanguage((CFStringRef)text, CFRangeMake(0, text.length));
     } else {
         return (NSString *)CFStringTokenizerCopyBestStringLanguage((CFStringRef)text, CFRangeMake(0, 100));
     }
}
Dollfuss answered 13/6, 2011 at 0:3 Comment(3)
return (NSString *)CFStringTokenizerCopyBestStringLanguage((CFStringRef)text, CFRangeMake(0, MIN(text.length, 100))); should be more clear and less prone to bug introduction if you later modify the call to CFStringTokenizerCopyBestStringLanguageMccombs
For a short word like "hello", it will return "it". Italian ?Filling
Well, its work fine bit larger sentences, But "ich bin Umair" out but was "nl" instead of "de"Indication
A
44

There is API in cocoa available to check the language of a string, and it is always best to use Foundation over CoreFoundation whenever possible.

NSArray *tagschemes = [NSArray arrayWithObjects:NSLinguisticTagSchemeLanguage, nil];
NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes:tagschemes options:0];
[tagger setString:@"Das ist ein bisschen deutscher Text. Bitte löschen Sie diesen nicht."];
NSString *language = [tagger tagAtIndex:0 scheme:NSLinguisticTagSchemeLanguage tokenRange:NULL sentenceRange:NULL];

Alternatively, if you happen to have mixed language text, you can use the enumerateLinguisticTagsInRange API to get the language of each word in the text.

Adscititious answered 9/12, 2012 at 1:35 Comment(1)
This works great! You can even create NSOrthography by simply calling [tagger orthographyAtIndex:0 effectiveRange:nil]. Definitely the best answer. Thank you.Jen
D
14

Thats the result:

- (NSString *)languageForString:(NSString *) text{

     if (text.length < 100) {
         return (NSString *) CFStringTokenizerCopyBestStringLanguage((CFStringRef)text, CFRangeMake(0, text.length));
     } else {
         return (NSString *)CFStringTokenizerCopyBestStringLanguage((CFStringRef)text, CFRangeMake(0, 100));
     }
}
Dollfuss answered 13/6, 2011 at 0:3 Comment(3)
return (NSString *)CFStringTokenizerCopyBestStringLanguage((CFStringRef)text, CFRangeMake(0, MIN(text.length, 100))); should be more clear and less prone to bug introduction if you later modify the call to CFStringTokenizerCopyBestStringLanguageMccombs
For a short word like "hello", it will return "it". Italian ?Filling
Well, its work fine bit larger sentences, But "ich bin Umair" out but was "nl" instead of "de"Indication
B
8

You can use -requestCheckingOfString:… instead. NSTextCheckingTypeOrthography attempts to identify the language used in the string, and the completion handler receives an NSOrthography parameter that can be used to get information about the orthography in the string, including its dominant language.

The following example outputs dominant language = de:

NSSpellChecker *spellChecker = [NSSpellChecker sharedSpellChecker];
[spellChecker setAutomaticallyIdentifiesLanguages:YES];
NSString *spellCheckText = @"Guten Herr Mustermann. Dies ist ein deutscher Text. Bitte löschen Sie diesen nicht.";

[spellChecker requestCheckingOfString:spellCheckText
    range:(NSRange){0, [spellCheckText length]}
    types:NSTextCheckingTypeOrthography
    options:nil
    inSpellDocumentWithTag:0
    completionHandler:^(NSInteger sequenceNumber, NSArray *results, NSOrthography *orthography, NSInteger wordCount) {
        NSLog(@"dominant language = %@", orthography.dominantLanguage);
}];
Bawbee answered 12/6, 2011 at 22:44 Comment(1)
This code works perfectly. Can anybody know the swift 2.1 equivalentHaley
W
6

A swift String extension for Jennifer's answer:

extension String {
    func language() -> String? {
        let tagger = NSLinguisticTagger(tagSchemes: [NSLinguisticTagSchemeLanguage], options: 0)
        tagger.string = self
        return tagger.tagAtIndex(0, scheme: NSLinguisticTagSchemeLanguage, tokenRange: nil, sentenceRange: nil)
    }
}

Usage:

let language = "What language is this?".language()
Walls answered 26/4, 2016 at 19:38 Comment(3)
let language = "%".language() // ---> this is a crash!Adsorbent
Didn't know this was supported in iOS - awesomeSkricki
#Tim FriedLand than don't forcefully wrappe it. just leave it optional. it will show nil value when don't understand the language.Devonadevondra
Y
4

With Swift 5, you can choose one of the following approaches in order to detect the language of a given string.


#1. Using NSLinguisticTagger's dominantLanguage property

Since iOS 11, NSLinguisticTagger has a property called dominantLanguage. dominantLanguage has the following declaration:

var dominantLanguage: String? { get }

Returns the dominant language of the string set for the linguistic tagger.

The Playground sample code below show how to use dominantLanguage in order to know the dominant language of a string:

import Foundation

let text = "あなたはそれを行うべきではありません。"
let tagger = NSLinguisticTagger(tagSchemes: [.language], options: 0)
tagger.string = text
let language = tagger.dominantLanguage
print(language) // Optional("ja")

#2. Using NSLinguisticTagger's dominantLanguage(for:) method

As an alternative, NSLinguisticTagger has a convenience method called dominantLanguage(for:) for creating a new linguistic tagger, setting its string property and getting the dominantLanguage property. dominantLanguage(for:) has the following declaration:

class func dominantLanguage(for string: String) -> String?

Returns the dominant language for the specified string.

Usage:

import Foundation

let text = "Die Kleinen haben friedlich zusammen gespielt."
let language = NSLinguisticTagger.dominantLanguage(for: text)
print(language) // Optional("de")

#3. Using NLLanguageRecognizer's dominantLanguage property

Since iOS 12, NLLanguageRecognizer has a property called dominantLanguage. dominantLanguage has the following declaration:

var dominantLanguage: NLLanguage? { get }

The most likely language for the processed text.

Here’s how to use dominantLanguage to guess the dominant language of natural language text:

import NaturalLanguage

let string = "J'ai deux amours. Mon pays et Paris."
let recognizer = NLLanguageRecognizer()
recognizer.processString(string)
let language = recognizer.dominantLanguage
print(language?.rawValue) // Optional("fr")
Ybarra answered 25/2, 2019 at 21:43 Comment(0)
S
1

As of iOS 11 you can use the dominantLanguage(for:)/dominantLanguageForString: class method of NSLinguisticTagger.

Swift:

extension String {
    var language: String? {
        return NSLinguisticTagger.dominantLanguage(for: self)
    }
}

print("Good morning".language)
print("Buenos días".language)

Objective-C:

@interface NSString (Tagger)

@property (nonatomic, readonly, nullable) NSString *language;
@end

@implementation NSString (Tagger)

- (NSString *)language {
    return [NSLinguisticTagger dominantLanguageForString:self];
}

@end

NSLog(@"%@", @"Good morning".language);
NSLog(@"%@", @"Buenos días".language);

Output (for both):

en
es

Schmooze answered 31/7, 2018 at 1:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.