The accepted answer is compact and elegant, but very inefficient if compared to other solutions.
I'll now propose and discuss the implementation of a few variants of anagram checker. To measure performance, I'll use the different variants to find the anagrams of a given word out of an array of 50,000+ words.
// Variant 1: Sorting of Character
// Measured time: 30.46 s
func anagramCheck1(a: String, b: String) -> Bool {
return a.characters.sorted() == b.characters.sorted()
}
This is essentially the solution of the accepted answer, written in Swift 3 syntax. It's very slow because Swift's String, unlike NSString, is based on Character, which handles Unicode characters properly.
A more efficient solution exploits the NSCountedSet class, which allows us to represent a string as a set of characters, each with its own count. Two strings are anagrams if they map to the same NSCountedSet.
Note: checking string lengths as a precondition makes the implementation always more efficient.
// Variant 2: NSCountedSet of Character
// Measured time: 4.81 s
func anagramCheck2(a: String, b: String) -> Bool {
guard a.characters.count == b.characters.count else { return false }
let aSet = NSCountedSet()
let bSet = NSCountedSet()
for c in a.characters {
aSet.add(c)
}
for c in b.characters {
bSet.add(c)
}
return aSet == bSet
}
Better but not excellent. Here, one of the "culprits" is the use of the native Swift Character type (from Swift's String). Moving back to good old Objective-C types (NSString and unichar) makes things more efficient.
// Variant 3: NSCountedSet of unichar
// Measured time: 1.31 s
func anagramCheck3(a: String, b: String) -> Bool {
let aString = a as NSString
let bString = b as NSString
let length = aString.length
guard length == bString.length else { return false }
let aSet = NSCountedSet()
let bSet = NSCountedSet()
for i in 0..<length {
aSet.add(aString.character(at: i))
bSet.add(bString.character(at: i))
}
return aSet == bSet
}
Using NSCountedSet is fine, but before we compare two NSCountedSet objects, we fully populate them. A useful alternative is to fully populate the NSCountedSet for only one of the two strings, and then, while we populate the NSCountedSet for the other string, we fail early if the other string contains a character that is not found in the NSCountedSet of the first string.
// Variant 4: NSCountedSet of unichar and early exit
// Measured time: 1.07 s
func anagramCheck4(a: String, b: String) -> Bool {
let aString = a as NSString
let bString = b as NSString
let length = aString.length
guard length == bString.length else { return false }
let aSet = NSCountedSet()
let bSet = NSCountedSet()
for i in 0..<length {
aSet.add(aString.character(at: i))
}
for i in 0..<length {
let c = bString.character(at: i)
if bSet.count(for: c) >= aSet.count(for: c) {
return false
}
bSet.add(c)
}
return true
}
This is about the best timing we are going to get (with Swift). However, for completeness, let me discuss one more variant of this kind.
The next alternative exploits a Swift Dictionary of type [unichar: Int] to store the number of repetitions for each character instead of NSCountedSet. It's slightly slower than the previous two variants, but we can reuse it later to obtain a faster implementation.
// Variant 5: counting repetitions with [unichar:Int]
// Measured time: 1.36
func anagramCheck5(a: String, b: String) -> Bool {
let aString = a as NSString
let bString = b as NSString
let length = aString.length
guard length == bString.length else { return false }
var aDic = [unichar:Int]()
var bDic = [unichar:Int]()
for i in 0..<length {
let c = aString.character(at: i)
aDic[c] = (aDic[c] ?? 0) + 1
}
for i in 0..<length {
let c = bString.character(at: i)
let count = (bDic[c] ?? 0) + 1
if count > aDic[c] ?? 0 {
return false
}
bDic[c] = count
}
return true
}
Note that a vanilla Objective-C implementation using NSCountedSet, corresponding to Variant 3, is faster than all the previous versions by a rather large margin.
// Variant 6: Objective-C and NSCountedSet
// Measured time: 0.65 s
- (BOOL)anagramChecker:(NSString *)a with:(NSString *)b {
if (a.length != b.length) {
return NO;
}
NSCountedSet *aSet = [[NSCountedSet alloc] init];
NSCountedSet *bSet = [[NSCountedSet alloc] init];
for (int i = 0; i < a.length; i++) {
[aSet addObject:@([a characterAtIndex:i])];
[bSet addObject:@([b characterAtIndex:i])];
}
return [aSet isEqual:bSet];
}
Another way we can improve upon the previous attempts is to observe that, if we need to find the anagram of a given word, we might as well consider that word as fixed, and thus we could build the corresponding structure (NSCountedSet, Dictionary, ...) for that word only once.
// Finding all the anagrams of word in words
// Variant 7: counting repetitions with [unichar:Int]
// Measured time: 0.58 s
func anagrams(word: String, from words: [String]) -> [String] {
let anagrammedWord = word as NSString
let length = anagrammedWord.length
var aDic = [unichar:Int]()
for i in 0..<length {
let c = anagrammedWord.character(at: i)
aDic[c] = (aDic[c] ?? 0) + 1
}
let foundWords = words.filter {
let string = $0 as NSString
guard length == string.length else { return false }
var bDic = [unichar:Int]()
for i in 0..<length {
let c = string.character(at: i)
let count = (bDic[c] ?? 0) + 1
if count > aDic[c] ?? 0 {
return false
}
bDic[c] = count
}
return true
}
return foundWords
}
Now, in the previous variant we have counted with a [unichar:Int] Dictionary. This proves slightly more efficient than using an NSCountedSet of unichar, either with early exit (0.60 s) or without (0.87 s).