Split string into words
Asked Answered
P

4

7

I am looking for the most efficient algorithm to form all possible combinations of words from a string. For example:

Input String: forevercarrot

Output:

forever carrot
forever car rot
for ever carrot
for ever car rot

(All words should be from a dictionary).

I can think of a brute force approach. (find all possible substrings and match) but what would be better ways?

Perissodactyl answered 21/1, 2011 at 3:41 Comment(1)
Your brute-force approach is right. Imagine you were given the same problem except for the request for words in a foreign language.Gesticulatory
L
6

Use a prefix tree for your list of known words. Probably libs like myspell already do so. Try using a ready-made one.

Once you found a match (e.g. 'car'), split your computation: one branch starts to look for a new word ('rot'), another continues to explore variants of current beginning ('carrot').

Effectively you maintain a queue of pairs (start_position, current_position) of offsets into your string every time you split the computation. Several threads can pop from this queue in parallel and try to continue a word that starts from start_position and is already known up to current_position of the pair, but does not end there. When a word is found, it is reported and another pair is popped from the queue. When it's impossible, no result is generated. When a split occurs, a new pair is added to the end of the queue. Initially the queue contains a (0,0).

Linnet answered 21/1, 2011 at 3:56 Comment(1)
Plus make sure you don't repeat the computation of splits of 'carrot' twice - once for 'forever' and once for 'for ever'. Cache partial resuts: Set(possible splits) for each [i..n].Perreault
P
1

See this question which has even better answers. It's a standard dynamic programming problem:

How to split a string into words. Ex: "stringintowords" -> "String Into Words"?

Perreault answered 6/12, 2012 at 23:0 Comment(0)
T
0

A psuedocode implementation, exploiting the fact that every part of the string needs to be a word, we can't skip anything. We work forward from the start of the string until the first bit is a word, and then generate all possible combinations of the rest of the string. Once we've done that, we keep going along until we find any other possibilities for the first word, and so on.

allPossibleWords(string s, int startPosition) {
    list ret
    for i in startPosition..s'length
        if isWord(s[startPosition, i])
            ret += s[startPostion, i] * allPossibleWords(s, i)
    return ret    
}

The bugbear in this code is that you'll end up repeating calculations - in your example, you'll end up having to calculate allPossibleWords("carrot") twice - once in ["forever", allPossibleWords["carrot"]] and once in ["for", "ever", allPossibleWords["carrot"]]. So memoizing this is something to consider.

Trula answered 21/1, 2011 at 3:51 Comment(0)
P
-1

Input String: forevercarrot

Output:

forever carrot forever car rot for ever carrot for ever car rot

program :

#include<iostream>
#include<string>
#include<vector>
#include<string.h>
void strsplit(std::string str)
{
   int len=0,i,x,y,j,k;
   len = str.size();
   std::string s1,s2,s3,s4,s5,s6,s7;
   char *c = new char[len+1]();
   char *b = new char[len+1]();
   char *d = new char[len+1]();
   for(i =0 ;i< len-1;i++)
   {
       std::cout<<"\n";
       for(j=0;j<=i;j++)
       {
          c[j] = str[j];
          b[j] = str[j];
          s3 += c[j];
          y = j+1;
       }
       for( int h=i+1;h<len;h++){
          s5 += str[h];
       }
       s6 = s3+" "+s5;
       std::cout<<" "<<s6<<"\n";
       s5 = "";
       for(k = y;k<len-1;k++)
       {
          d[k] = str[k];
          s1 += d[k];
          s1 +=  " ";
          for(int l = k+1;l<len;l++){
           b[l] = str[l];
           s2 += b[l];
         }
       s4 = s3+" "+s1+s2;
       s7 = s4;
       std::cout<<" "<<s4<<"\n";
       s3 = "";s4 = "";
       }
       s1 = "";s3 = "";
    }
}

int main(int argc, char* argv[])
{
    std::string str;
    if(argc < 2)
                std::cout<<"Usage: "<<argv[0]<<" <InputString> "<<"\n";
    else{
                str = argv[1];
                strsplit(str);
    }

return 0;
}
Pavis answered 9/12, 2013 at 7:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.