Levenshtein algorithm with custom character mapping
Asked Answered
R

1

5

I want to use Levenshtein algorithm to search in a list of strings. I want to implement a custom character mapping in order to type latin characters and searching in items in greek.

mapping example:

a = α, ά
b = β
i = ι,ί,ΐ,ϊ
... (etc)
u = ου, ού

So searching using abu in a list with

  • αbu
  • abού
  • αού (all greek characters)

will result with all items in the list. (item order is not a problem)

How do I apply a mapping in the algorithm? (this is where I start)

Rafael answered 22/3, 2012 at 11:38 Comment(2)
The Levenshtein algorithm compares two strings based on an edit distance metric. It typically defines a substitution rule which would seem to encompass what you are talking about. Grab some sample code (sample code usually substitutes A-Z regardless of the character) and just replace this with your specific substitution rules.Mizzle
@Jon How do I apply a mapping in the algorithm?Rafael
H
8

I think the best way would be to preprocess your symbols to one definite form (e.g. all in latin) and then use Levenshtein as you would do normaly.

In pseudocode:

int func(String latinStr, String greekStr) {
   String mappedStr = convertToLatin(greekStr); // e.g. now αβ would be ab 
   return Levenstein(latinStr, mappedStr);
}

And in convertToLatin you may symbol-by-symbol ask Dictionary with mappings for a replace and construct new string

Hound answered 22/3, 2012 at 11:59 Comment(1)
The process you're referring to is called 'canonicalization'.Culpa

© 2022 - 2024 — McMap. All rights reserved.