Algorithm to remove a character from a word such that the reduced word is still a word in dictionary
Asked Answered
L

6

10

Here is the scenario, Given a word remove a single character from a word in every step such that the reduced word is still a word in dictionary. Continue till no characters are left.

Here is the catch: You need to remove the right character, for eg. in a word there may be two possible characters which could be removed and both may cause the reduced word to be a valid word, but at a later stage one may get reduced to the end i.e. no characters left while the other may hang up.

Example:

  • planet
  • plant
  • pant
  • pan
  • an
  • a

OR

  • planet
  • plane
  • lane
  • not possible further, suppose lan is not a word. hope you got the idea.

Please see my code, im using recursion, but would like to know if there are better efficient solutions to do the same.

public class isMashable
{

  static void initiate(String s)
  {
    mash("", s);
  }

  static void mash(String prefix, String s)
  {
    int N = s.length();
    String subs = "";

    if (!((s.trim()).equals("")))
      System.out.println(s);

    for (int i = 0 ; i < N ; i++)
    {
      subs = s.substring(0, i) + s.substring(i+1, N);
      if (subs.equals("abc")||subs.equals("bc")||subs.equals("c")||subs.equals("a")) // check in dictionary here
        mash("" + s.charAt(i), subs);
    }
  }

  public static void main(String[] args)
  {
    String s = "abc";
    initiate(s);
  }
}
Lody answered 28/6, 2012 at 5:9 Comment(6)
You should have a dictionary (like a Map<String, Integer>) or some other way to check that the actual word is still a valid word. Also, you should try to sent the word letters combination instead of just sending substrings of your whole word. For example, if you send planet, by your algorithm you won´t be able to test pet combination.Faefaeces
Javascript example (warning: jsfiddle is a bit slow): jsfiddle.net/BA8PJAgnola
you might want to use a directed graph containing each word as node. you create an edge from node A to node B iff it's possible to pass from A to B removing only one letter. To simplify graph creation, you first test word's length before trying to eliminate a letter.Twitter
@LuiggiMendoza: Yup, thanks for finding out the bug have fixed it :)Lody
@NitishMD I would like to see your final solution to the problem, I'm really interested but don't have enough time to make one myself :(.Faefaeces
@LuiggiMendoza: Sure il post it.. fixing few things in it :)Lody
C
2

Run a BFS algorithm. If you have more than one characters that you can remove, remove them individually and put in a priority queue, if you want to retrace the path, keep the pointer to the parent(the original word from which you created this word by removing a character) of the word in the node itslef. And when you remove all the characters, terminate and retrace the path, or if there is no valid way, you will have an empty priority queue

Commutative answered 28/6, 2012 at 6:33 Comment(1)
In the case of only one path, DFS is quicker on average (the only difference between DFS and BFS being that they use a stack and a queue, respectively. The rest is identical). However, if the OP wants to check for all possible paths (not just return a single path) or if no path exists then they are equivilent.Cavalla
V
1

I have used Porter Stemming in a couple of projects - that will of course only help you trim off the end of the word.

The Porter stemming algorithm (or ‘Porter stemmer’) is a process for removing the commoner morphological and inflexional endings from words in English. Its main use is as part of a term normalisation process that is usually done when setting up Information Retrieval systems.

A reprint occoured in M.F. Porter, 1980, An algorithm for suffix stripping, Program, 14(3) pp 130−137.

Martin even has a Java version available on his site.

Valuable answered 28/6, 2012 at 7:54 Comment(0)
F
1

Here you go. The mash-method will find a solution (list of dictionary words) for any given String using a dictionary passed to the constructor. If there's no solution (ending to a one letter word), the method will return null. If you are interested in all partial solutions (ending before getting to a one letter word), you should tweak the algorithm a bit.

The dictionary is assumed to be a set of uppercase Strings. You could of course use your own class/interface instead.

import java.util.ArrayList;
import java.util.List;
import java.util.Set;

public class WordMash {

    private final Set<String> dictionary;

    public WordMash(Set<String> dictionary) {
        if (dictionary == null) throw new IllegalArgumentException("dictionary == null");
        this.dictionary = dictionary;
    }

    public List<String> mash(String word) {
        return recursiveMash(new ArrayList<String>(), word.toUpperCase());
    }

    private List<String> recursiveMash(ArrayList<String> wordStack, String proposedWord) {
        if (!dictionary.contains(proposedWord)) {
            return null;
        }
        wordStack.add(proposedWord);

        if (proposedWord.length() == 1) {
            return wordStack;
        }

        for (int i = 0; i < proposedWord.length(); i++) {
            String nextProposedWord = 
                proposedWord.substring(0, i) + proposedWord.substring(i + 1, proposedWord.length());    
            List<String> finalStack = recursiveMash(wordStack, nextProposedWord);
            if (finalStack != null) return finalStack;
        }

        return null;
    }

}

Example:

Set<String> dictionary = new HashSet<String>(Arrays.asList(
        "A", "AFRICA", "AN", "LANE", "PAN", "PANT", "PLANET", "PLANT"
));
WordMash mash = new WordMash(dictionary);

System.out.println(mash.mash("planet"));
System.out.println(mash.mash("pant"));


System.out.println(mash.mash("foo"));
System.out.println(mash.mash("lane"));
System.out.println(mash.mash("africa"));
Flivver answered 28/6, 2012 at 9:29 Comment(0)
W
1

Here is an algorithm that uses depth first search. Given a word, you check if its valid (in dictionary). If its valid, remove one character from the string at each index and recursively check the 'chopped' word is valid again. If the chopped word is invalid at any point, you are in the wrong path and go back to previous step.

import java.util.HashSet;
import java.util.Set;

public class RemoveOneCharacter {
    static Set<String> dict = new HashSet<String>();

    public static boolean remove(String word){
        if(word.length() == 1)
            return true;

        if(!dict.contains(word))
            return false;

        for(int i=0;i<word.length();i++){
            String choppedWord = removeCharAt(word,i);
            boolean result = remove(choppedWord);
            if(result)
                return true;
        }
        return false;
    }

    public static String removeCharAt(String str, Integer n) {
        String f = str.substring(0, n);
        String b = str.substring(n+1, str.length());
        return f + b;
    }

    public static void main(String args[]){
        dict.add("heat");
        dict.add("eat");
        dict.add("at");
        dict.add("a");

        dict.add("planets");
        dict.add("planet");
        dict.add("plant");
        dict.add("plane");
        dict.add("lane");
        dict.add("plants");
        dict.add("pant");
        dict.add("pants");
        dict.add("ant");
        dict.add("ants");
        dict.add("an");


        dict.add("clean");
        dict.add("lean");
        dict.add("clan");
        dict.add("can");

        dict.add("why");

        String input = "heat";
        System.out.println("result(heat) " + remove(input));
        input = "planet";
        System.out.println("result(planet) " + remove(input));
        input = "planets";
        System.out.println("result(planets) " + remove(input));
        input = "clean";
        System.out.println("result(clean) " + remove(input));
        input = "why";
        System.out.println("result(why) " + remove(input));
        input = "name";
        System.out.println("result(name) " + remove(input));


    }

}
Wanderjahr answered 16/10, 2016 at 2:7 Comment(0)
A
0

OK, it is not Java, just JavaScript, but probably you can transform it:

http://jsfiddle.net/BA8PJ/

function subWord( w, p, wrapL, wrapR ){
  return w.substr(0,p)
      + ( wrapL ? (wrapL + w.substr(p,1) + wrapR ):'')
      + w.substr(p+1);
}

// wa = word array:         ['apple','banana']
// wo = word object/lookup: {'apple':true,'banana':true}
function initLookup(){
  window.wo = {};
  for(var i=0; i < wa.length; i++) wo[ wa[i] ] = true;
}



function initialRandomWords(){
  // choose some random initial words
  var level0 = [];
  for(var i=0; i < 100; i++){
    var w = wa[ Math.floor(Math.random()*wa.length) ];
    level0.push({ word: w, parentIndex:null, pos:null, leaf:true });
  }
  return level0;
}



function generateLevels( levels ){
  while(true){
    var nl = genNextLevel( levels[ levels.length-1 ]);
    if( ! nl ) break;
    levels.push( nl );
  }
}

function genNextLevel( P ){ // P: prev/parent level
  var N = [];               // N: next level
  var len = 0;
  for( var pi = 0; pi < P.length; pi ++ ){
    pw = P[ pi ].word; // pw: parent word
    for( var cp = 0; cp < pw.length; cp++ ){ // cp: char pos
      var cw = subWord( pw, cp ); // cw: child word
      if( wo[cw] ){
        len++;
        P[ pi ].leaf = false;
        N.push({ word: cw, parentIndex:pi, pos:cp, leaf:true });
      }
    }
  }
  return len ? N : null;
}



function getWordTraces( levels ){
  var rows = [];
  for( var li = levels.length-1; li >= 0; li-- ){
    var level = levels[ li ];
    for( var i = 0; i < level.length; i++ ){
      if( ! level[ i ].leaf ) continue;
      var trace = traceWord( li, i );
      if( trace.length < 2 ) continue;
      rows.push( trace );
    }
  }
  return rows;
}

function traceWord( li, i ){
  var r = [];
  while(true){
    var o = levels[ li ][ i ];
    r.unshift( o );
    i = o.parentIndex;
    if( !i ) break;
    li--;
    if( li < 0 ) break;
  };
  return r;
}



function compareTraces( aa, bb ){
  var a = aa[0].word, b = bb[0].word;
  if( a == b ){
    if( aa.length < bb.length ) return -1;
    if( aa.length > bb.length ) return +1;
  }

  var len = Math.min( aa.length, bb.length )
  for( var i = 0; i < len; i++ ){
    var a = aa[i].word, b = bb[i].word;
    if( a < b ) return +1;
    if( a > b ) return -1;
  }

  if( aa.length < bb.length ) return -1;
  if( aa.length > bb.length ) return +1;

  return 0;
}


function prettyPrintTraces( rows ){
  var prevFirstWord = null;
  for( var ri = rows.length-1; ri >= 0; ri-- ){
    var row = rows[ ri ];

    if(  prevFirstWord != row[0].word  ){
      if( prevFirstWord ) $('body').append('<div class="sep"/>');
      prevFirstWord = row[0].word;
    }

    var $row = $('<div class="row"/>');
    for( var i = 0; i < row.length; i++ ){

      var w = row[i].word;
      var c = row[i+1];
      if( c )  w = subWord( w, c.pos, '<span class="cut">', '</span>');

      var $word = $('<div class="word"></div>').html( w ).toggleClass('last-word', w.length < 2 );
      $row.append( $word );
    }
    $('body').append( $row );
  }
};

function main(){
  initLookup();

  window.levels = [ initialRandomWords() ];

  generateLevels( levels );

  rows = getWordTraces( levels );

  rows.sort( compareTraces );

  prettyPrintTraces( rows );
}
Agnola answered 28/6, 2012 at 7:40 Comment(3)
Can you explain your approach and its complexity?Lody
I did it just for fun. The other solutions are indeed much shorter, so you can ignore this. level[0] contains the original words, level[1] has all words with 1 char removed, level[2]: 2 chars removed, plus they point to their parent word, so we can build all possible valid paths from the shortest words to the original word. I use multiple words initially, because I wasn't sure if the short wordlist I found somewhere will contain any path.Agnola
BTW this is the latest fiddle, I forgot to update answer: jsfiddle.net/BA8PJ/1Agnola
A
0

Make a trie (or suffix tree )with given characters in the word(no repetions allowed), and check each subtree of trie with dictionary. This should help you.

For reference visit

Almund answered 28/6, 2012 at 8:24 Comment(1)
But how is the trie helping? We are removing a character at a time. I did not get this soln. Can you explain?Lody

© 2022 - 2024 — McMap. All rights reserved.