Efficient algorithm to find all "character-equal" strings?

How can we write an efficient function that outputs "homoglyph equivalents" of an input string?

Example 1 (pseudo-code):

homoglyphs_list = [
                     ["o", "0"], // "o" and "0" are homoglyphs
                     ["i", "l", "1"] // "i" and "l" and "1" are homoglyphs
                  ]

input_string    = "someinput"
output          = [
                   "someinput", "s0meinput", "somelnput",
                   "s0melnput", "some1nput", "s0me1nput"
                  ]

Example 2:

homoglyphs_list = [
                     ["rn", "m", "nn"],
                  ]

input_string    = "rnn"
output          = ["rnn", "rm", "mn", "rrn", "nnn", "nm", "nrn"]

Example 3:

homoglyphs_list = [
                     ["d", "ci", "a"], // "o" and "0" are homoglyphs
                     ["i", "l", "1"] // "i" and "l" and "1" are homoglyphs
                  ]
                  /*
                     notice that with the two rules above,
                     we can infer "d" = "ci" = "a" = "cl" = "c1"
                  */

input_string    = "pacerier"
output          = [
                   "pacerier",  "pacerler",  "pacer1er",  "pcicerier",
                   "pcicerler", "pcicer1er", "pclcerier", "pc1cerier",
                   "pclcerler", "pc1cerler", "pclcer1er", "pc1cer1er",
                   "pdcerier",  "pdcerler",  "pdcer1er"
                  ]

Note: The order of the members in the output array isn't important, and we can assume that the given homoglyph mappings are assumed to be proper (inputs wouldn't give us an "infinite loop").

My current algorithm works, but it's using raw bruteforcing and performance is awful. E.g. an input of "mmmmm" with homoglyphs ["rn", "m", "nn"] takes 38 seconds to run:

// This is php code (non-pseudo just so we could test the running time), 
// but the question remains language-agnostic

public function Func($in, Array $mappings){
    $out_table = array();
    $out_table[$in] = null;
    while(true){
        $number_of_entries_so_far = count($out_table);
        foreach(array_keys($out_table) as $key){
            foreach($mappings as $mapping){
                foreach($mapping as $value){
                    for($a=0, $aa=strlen($key); $a<$aa; ++$a){
                        $pos = strpos($key, $value, $a);
                        if($pos === false){
                            continue;
                        }
                        foreach($mapping as $equal_value){
                            if($value === $equal_value){
                                continue;
                            }
                            $out_table[substr($key, 0, $pos) . $equal_value . substr($key, $pos + strlen($value))] = null;
                        }
                    }

                }
            }
        }
        if($number_of_entries_so_far === count($out_table)){
            // it means we have tried bruteforcing but added no additional entries,
            // so we can quit now
            break;
        }
    }
    return array_keys($out_table);
}

How can we implement an efficient (fast) homoglyph expansion algorithm?

cache = {} def homoglyph(input_string, mappings): output = [] character = input_string[0] if input_string in cache: return cache[input_string] elif not input_string: return [] for charset in mappings: if character in charset: temp = input_string sub_homoglyphs = homoglyph(input_string[1:], mappings) for char in charset: temp[0] = char output.append(temp) #adding the homoglyph character to the rest of the string for subhomoglyph in sub_homoglyphs: output.append(char+subhomoglyph) cache[input_string] = output return output

Recommended topics

Hot tags