Closest-match string-array sorting in Swift
Asked Answered
F

1

7

Using Swift4, I would like to sort a string-array according to the closest match to a given searchTerm. Important is to me that if the searchTerm can be found as an exact-match, then the returnArray should show this searchTerm upfront !

Example: Given the Array = ["Hello world", "Hello Jamaica", "Hello", "Family", "Hel"]

And the searchTerm = "Hello", the algorithm should return:

["Hello", "Hello world", "Hello Jamaica", "Hel", "Family"].

Approach 1: I tried to use FuzzyMatching - and it somehow worked (i.e. it did sort the inputArray according to a given searchTerm, however it did not put the exact-matches upfront ! i.e. With FuzzyMatching I achieved a good sorting according to substring-matches and syntactic sorting. But it did not bring me the exact-matches upfront in the returnArray).

Approach 2: Then I tried my own algorithm - (see code below). But if there are several strings in the array that all start with my searchTerm (i.e. have searchTerm as a prefix), then somehow my algo does not a good job.

static func bestMatchFilterdStringArray(inputArray: [String], searchTerm: String) -> [String] {

    let matchingTerms = inputArray
        .filter { $0.range(of: searchTerm, options: .caseInsensitive) != nil }
        .sorted { ($0.hasPrefix(searchTerm) ? 0 : 1) < ($1.hasPrefix(searchTerm) ? 0 : 1) }
    return matchingTerms
}

How is a "Closest-match string-array sorting" done in Swift4? Especially bringing me exact-matches upfront in the returnArray? Any help appreciated!

Floria answered 13/12, 2017 at 13:53 Comment(1)
Possible duplicate of How to sort an array of string by similarity to specific keyFailing
A
10

You can use Levenshtein distance score to compare your search term with every string in the array, and the one with the highest score will be the first term in your result array etc. Your result will be an array of strings sorted in descending order of the score.

Following extension to string can be used to get Levenshtein distance score. In this algorithm, higher the value, better the equality.

 extension String {
    func levenshteinDistanceScore(to string: String, ignoreCase: Bool = true, trimWhiteSpacesAndNewLines: Bool = true) -> Double {

        var firstString = self
        var secondString = string

        if ignoreCase {
            firstString = firstString.lowercased()
            secondString = secondString.lowercased()
        }
        if trimWhiteSpacesAndNewLines {
            firstString = firstString.trimmingCharacters(in: .whitespacesAndNewlines)
            secondString = secondString.trimmingCharacters(in: .whitespacesAndNewlines)
        }

        let empty = [Int](repeating:0, count: secondString.count)
        var last = [Int](0...secondString.count)

        for (i, tLett) in firstString.enumerated() {
            var cur = [i + 1] + empty
            for (j, sLett) in secondString.enumerated() {
                cur[j + 1] = tLett == sLett ? last[j] : Swift.min(last[j], last[j + 1], cur[j])+1
            }
            last = cur
        }

        // maximum string length between the two
        let lowestScore = max(firstString.count, secondString.count)

        if let validDistance = last.last {
            return  1 - (Double(validDistance) / Double(lowestScore))
        }

        return 0.0
    }
}
Attraction answered 12/2, 2019 at 13:27 Comment(2)
Thank you, Ankit, ...since it's been a while since I asked that question, I will need to dig-in first. Again, many thanks for your solution !Floria
Also if you prefix the first string, you will get more accurate. like, var firstString = String(self.prefix(string.count)). let str = "yilmaz" str.levenshteinDistanceScore(to: "yil") will also be 1 as resultPeriodicity

© 2022 - 2024 — McMap. All rights reserved.