I have written down a simple function that determines if str1 is a prefix of str2. It's a very simple function, that looks like this (in JS):
function isPrefix(str1, str2) // determine if str1 is a prefix of a candidate string
{
if(str2.length < str1.length) // candidate string can't be smaller than prefix string
return false;
var i = 0;
while(str1.charAt(i) == str2.charAt(i) && i <= str1.length)
i++;
if(i < str1.length) // i terminated => str 1 is smaller than str 2
return false;
return true;
}
As you can see, it loops through the entire length of the prefix string to gauge if it is a prefix of the candidate string. This means it's complexity is O(N), which isn't bad but this becomes a problem when I have a huge data set to consider looping through to determine which strings have the prefix string as a part of the prefix. This makes the complexity multiple like O(M*N) where M is the total number of strings in a given data set. Not good.
I explored the Internet a bit to determine that the best answer would be a Patricia/Radix trie. Where strings are stored as prefixes. Even then, when I attempt to insert/look-up a string, there will be a considerable overhead in string matching if I use the aforementioned prefix gauging function.
Say I had a prefix string 'rom' and a set of candidate words
var dataset =["random","rapid","romance","romania","rome","rose"];
that would like this in a radix trie :
r
/ \
a o
/ \ / \
ndom pid se m
/ \
an e
/ \
ia ce
This means, for every node, I will be using the prefix match function, to determine which node has a value that matches the prefix string at the index. Somehow, this solution still seems arduous and does not sit too well with me. Is there something better or anyway I can improve the core prefix matching function ?