Code Golf: Word Search Solver
Asked Answered
A

7

24

Note: This is my first Code Golf challenge/question, so I might not be using the correct format below. I'm not really sure how to tag this particular question, and should this be community wiki? Thanks!

This Code Golf challenge is about solving word searches!

A word search, as defined by Wikipedia, is:

A word search, word find, word seek, word sleuth or mystery word puzzle is a word game that is letters of a word in a grid, that usually has a rectangular or square shape. The objective of this puzzle is to find and mark all the words hidden inside the box. The words may be horizontally, vertically or diagonally. Often a list of the hidden words is provided, but more challenging puzzles may let the player figure them out. Many word search puzzles have a theme to which all the hidden words are related.

The word searches for this challenge will all be rectangular grids with a list of words to find provided. The words can be written vertically, horizontally, or diagonally.


Input/Output

The user inputs their word search and then inputs a word to be found in their grid. These two inputs are passed to the function that you will be writing. It is up to you how you want to declare and handle these objects.

Using a strategy described below or one of your own, the function finds the specific word in the search and outputs its starting coordinates (simply row number and column number) and ending coordinates. If you find two occurrences of the word, you must output both sets of coordinates. If the word is a palindrome, you may arbitrarily choose one end to be the "start" of the word.


Example

Input:

A I Y R J J Y T A S V Q T Z E 
X B X G R Z P W V T B K U F O 
E A F L V F J J I A G B A J K 
R E S U R E P U S C Y R S Y K 
F B B Q Y T K O I K H E W G N 
G L W Z F R F H L O R W A R E 
J A O S F U E H Q V L O A Z B 
J F B G I F Q X E E A L W A C 
F W K Z E U U R Z R T N P L D 
F L M P H D F W H F E C G W Z 
B J S V O A O Y D L M S T C R 
B E S J U V T C S O O X P F F 
R J T L C V W R N W L Q U F I 
B L T O O S Q V K R O W G N D 
B C D E J Y E L W X J D F X M 

Word to find: codegolf

Output:

row 12, column 8 --> row 5, column 1

Strategies

Here are a few strategies you might consider using. It is completely up to you to decide what strategy you want to use; it doesn't have to be in this list.

  • Looking for the first letter of the word; on each occurrence, looking at the eight surrounding letters to see whether the next letter of the word is there.
  • Same as above, except looking for a part of a word that has two of the same letter side-by-side.
  • Counting how often each letter of the alphabet is present in the whole grid, then selecting one of the least-occurring letters from the word you have to find and searching for the letter. On each occurrence of the letter, you look at its eight surrounding letters to see whether the next and previous letters of the word is there.
Authoritarian answered 2/4, 2010 at 17:25 Comment(13)
No explanation for the downvote?Authoritarian
Code golf questions must always be community wiki. meta.stackexchange.com/questions/24242/…Beguine
It's remarkable how quickly my eye caught the word in the grid without seeing the solution. Damn humans and their powerful pattern recognition...Buffon
Unhandled case: What if a word is a palindrome, e.g. "RACECAR"? Should you print it twice (because it can "start" at two different spots), or is it the same word (since it traverses the same set of grid locations)?Ambassadress
@John thanks for catching that. What do you think would be better? I'll include it in the question. I think that it's better to treat it as the same word, because word searches are about finding the words. By finding a starting and ending point, we're only outlining the word. If the word is a palindrome, it doesn't matter what order the starting & ending coordinates are in. I think we should only print it once, but I'm interested in hearing your opinion. Thanks!Authoritarian
What's up, 3 code golfs in 2 days?!Valvate
what do you mean by "It is up to you how you want to declare and handle these objects."?Cabezon
also: what's up with the strategies? are you searching for the most efficient solution, or for the most concise one?Cabezon
After banging my head against the wall for a good fifteen minutes, I noticed your example was wrong. It should be column 8, not 9.Lister
code-golf means Shortest Code you should add that to the question and make sure the answer with the shortest code is accepted meta.stackexchange.com/questions/24242Sasnett
Can you clarify the input and output specifications? For example, does the output have to be formatted as shown precisely? Or are we free to choose what is most convenient for each language?Salita
+1 Wow, this one brings back nostalgic childhood memories.Greco
I had this idea a little over a week ago, except my idea for the output was to display all the words found in their original places in the result. Maybe this is easier.Doubleedged
W
12

Python - 186 chars

def f(g,W):w=g.find("\n")+1;L=len(W);print" --> ".join("row %s, column %s"%(x/w+1
,x%w+1)for i in range(len(g))for j in(-w-1,-w,-w+1,-1,1,w-1,w,w+1)for x in(i,i+(L
-1)*j)if g[i::j][:L]==W)

testing code:

grid="""A I Y R J J Y T A S V Q T Z E 
X B X G R Z P W V T B K U F O 
E A F L V F J J I A G B A J K 
R E S U R E P U S C Y R S Y K 
F B B Q Y T K O I K H E W G N 
G L W Z F R F H L O R W A R E 
J A O S F U E H Q V L O A Z B 
J F B G I F Q X E E A L W A C 
F W K Z E U U R Z R T N P L D 
F L M P H D F W H F E C G W Z 
B J S V O A O Y D L M S T C R 
B E S J U V T C S O O X P F F 
R J T L C V W R N W L Q U F I 
B L T O O S Q V K R O W G N D 
B C D E J Y E L W X J D F X M """.lower().replace(" ","")
f(grid,"codegolf")
f(grid,"serverfault")
f(grid,"superuser")

This version is 196 chars and accepts the grid without doing any extra tweaking

def f(g,W):w=g.find("\n")+1;L=len(W);print" --> ".join("row %s, column %s"%(x/w+1,x%w/2+1)for i in range(len(g))for j in(-w-2,-w,-w+2,-2,2,w-2,w,w+2)for x in(i,i+(L-1)*j)if g[i::j][:L].lower()==W)

testing code:

grid="""A I Y R J J Y T A S V Q T Z E 
X B X G R Z P W V T B K U F O 
E A F L V F J J I A G B A J K 
R E S U R E P U S C Y R S Y K 
F B B Q Y T K O I K H E W G N 
G L W Z F R F H L O R W A R E 
J A O S F U E H Q V L O A Z B 
J F B G I F Q X E E A L W A C 
F W K Z E U U R Z R T N P L D 
F L M P H D F W H F E C G W Z 
B J S V O A O Y D L M S T C R 
B E S J U V T C S O O X P F F 
R J T L C V W R N W L Q U F I 
B L T O O S Q V K R O W G N D 
B C D E J Y E L W X J D F X M """
f(grid,"codegolf")
f(grid,"serverfault")
f(grid,"superuser")
Westsouthwest answered 2/4, 2010 at 17:25 Comment(1)
Congratulations for producing a shorter version that the perl mongers so far!Griseous
M
8

Perl, 226 char

sub h($){"row ",$%=1+($x=pop)/$W,", column ",1+$x%$W}
@S=map/[^ ]/g,<STDIN>;
B:for(@ARGV){
 for$d(map{$_,-$_}1,$W=@S/$.,$W-1,$W+1){
  A:for$s(0..$#S){
   $t=$s-$d;for(/./g){next A if$S[$t+=$d]ne$_||$t<0}
   print h$s,' --> ',h$t,$/;
   next B
}}}

Character count excludes newlines and indentation, which are included for readability. I assume the matrix of letters is specified with standard input and the target words are command-line arguments.

First line (after the sub h definition) maps all non-space characters into a single array @S. Then iterate over all the target words, over the possible directions that words may appear in ($W is the width of the puzzle), and over all of the letters in the current target word until a match is found.

$ perl wordsearch.pl CODEGOLF RACECAR BYKLHQU <<EOF
A I Y R J J Y T A S V Q T Z E 
X B X G R Z P W V T B K U F O 
E A F L V F J J I A G B A J K 
R E S U R E P U S C Y R S Y K 
F B B Q Y T K O I K H E W G N 
G L W Z F R F H L O R W A R E 
J A O S F U E H Q V L O A Z B 
J F B G I F Q X E E A L W A C 
F W K Z E U U R Z R T R A C E 
C A R P H D F W H F E C G W Z 
B J S V O A O Y D L M S T C R 
B E S J U V T C S O O X P F F 
R J T L C V W R N W L Q U F I 
B L R A C E C A R R O W G N D 
B C D E J Y E L W X J D F X M 
EOF
row 12, column 8 --> row 5, column 1
row 14, column 3 --> row 14, column 9
row 3, column 12 --> row 9, column 6
Macedonian answered 2/4, 2010 at 17:25 Comment(1)
You have a tendency of sneaking up on me. Good job. :)Cataract
C
6

Perl - 243

(Not including linebreaks, all of which are optional)

Mostly de-golfed code can be found here.

$/="";@w=split//,pop;($b=<>)=~y/ //d;$W=1+index$b,"\n";
for(1,2){for(0,$W-2..$W){$p=join"."x$_,@w;
push@m,[$-[0],$+[0]-1]while$b=~/$p/sg}
@w=reverse@w;@m=map[reverse@$_],@m}
$t="row %d, column %d";
printf"$t --> $t\n",map{1+$_/$W,1+$_%$W}@$_ for@m;

Usage: run as a script. Either wordsearch.pl boardfile searchword to read from a file, or wordsearch.pl searchword to read the board from STDIN (hey, pop is two characters shorter than shift!)

The idea is that we read the board into a string (throwing away the extra spaces between letters), and also make note of how wide it is (by counting characters until the first newline). Then we do a series of regex matches to find the word, using the s regex modifier to allow a match to span lines. Given a board that's 4 letters wide (so each line is 5 chars including newline), and the search word "CAR", we can make the patterns /CAR/ to match forwards, /C...A...R/ to match going southwest, /C....A....R/ to match going downwards, and /C.....A.....R/ to match going southeast. For each match, we record the starting and ending positions of the match.

Then we reverse the search word ("CAR" -> "RAC") and do it again, which lets us match the word going left, up, northwest, and northeast. Of course, we want to make sure that the start/end positions are also swapped. To save code, I swap the match positions twice; the matches for the forward word are swapped both times (so they come out un-swapped), while the matches for the backward word are only swapped once, so they will come out swapped.

Finally, it's just a matter of taking the match offsets into the board string and turning them into row/column pairs, and formatting the output how the problem wants it.

Cataract answered 2/4, 2010 at 17:25 Comment(8)
It's hard to see where the script deals with diagonal matches. Could you explain the logic?Terle
great explanation :) is it shorter to do the reverse than to generate extra regexes?Blur
Where's the regex to rotate the board ?!?Macedonian
That's really slick. I only understood the trick when I replaced the "." with a " ". Match anything on the board while treating it as a single string to deal with the diagonals and verticals.Terle
If only, mobrule. If only. :)Cataract
An interesting side-effect of using the /s modifier is that it does not match cases like ... C O D on one line and E G O L F ... on the next, which is what is wanted.Terle
Shave off a character by using printf instead of sprintf ?Terle
@Terle for a little while after I submitted it I was really worried that it would accidentally match across lines and be disqualified, but then I realized that no matter what, if it's going to wrap like that it's going to run into a newline in the middle of the word and break the match :)Cataract
E
4

AWK - 252 255

NF<2{l=split($1,w,"")}NF>1{for(i=1;i<=NF;)t[x=i++,y=NR]=$i}END{
for(i=1;i<=x;++i)for(j=0;++j<=y;)for(a=0;a<9;++a)if(t[I=i,J=j]==w[1])
for(k=1;k<l;){if(!(T=t[I+=int(a/3)-1,J+=a%3-1])||T!=w[++k])break;if(k==l)
print"row "j (B=", column ")i" --> row "J B I}}

The input should be in the form

....
B L T O O S Q V K R O W G N D 
B C D E J Y E L W X J D F X M 
CODEGOLF
Emelda answered 2/4, 2010 at 17:25 Comment(0)
B
3

Python, 318 strokes.

from itertools import*
f=lambda x,y,i,j,n:(x-i+1,y-j+1)*(not n)or all((2*i+j,x+1,y+1,x<c,y<d))and a[x][y*2]==n[0]and f(x+i,y+j,i,j,n[1:])
w=raw_input
a=list(iter(w,''))
c=len(a)
d=len(a[0])/2
r=range
z=r(-1,2)
s='row %d, column %d'
for x in product(r(c),r(d),z,z,[w()]):
 if f(*x):print s%(x[0]+1,x[1]+1),'-->',s%f(*x)


Sample input:
A I Y R J J Y T A S V Q T Z E 
X C O D E G O L F B K U F O Z
E A F L V F J J I A G B A J K 
R E S U R E P U S C Y R S Y K 
F B B Q Y T K O I K H E W G N 
G L W Z F R F H L O R W A R E 
J A O S F U E H Q V L O A Z B 
J F B G I F Q X E E A L W A C 
F W K Z E U U R Z R T N P L D 
F L M P H D F W H F E C G W Z 
B J S V O A O Y D L M S T C R 
B E S J U V T C S O O X P F F 
R J T L C V W R N W L Q U F I 
B L T O O S Q V K R O W G N D 
B C D E J Y E L W X J D F X M 

CODEGOLF

Output:

$ python wordsearch < wordsearchinput.txt
row 2, column 2 --> row 2, column 9
row 12, column 8 --> row 5, column 1

Released under the "edit-the-answer-instead-of-commenting-on-how-I-should-fix-this" license.

Blur answered 2/4, 2010 at 17:25 Comment(1)
I know it's nitpicking, but in the sample input there is a space between every character in the input (the raster)Music
L
3

C++ C, 411 400 388 367 characters

My first attempt ever at a code golf.

#include<stdio.h>
main(){char m[999],w[99];int r,c,l=-1,x=-1,y=-1,i=0,j,k;scanf("%s %d %d"
,w,&r,&c);for(;w[l+1];++l);for(;i<r*c;scanf("%s",&m[i++]));for(;y<2;++y)
for(;x<2;++x)if(x|y)for(i=(y<0)*l;i<r-(y>0)*l;++i)for(j=(x<0)*l;j<c-(x>0
)*l;++j)for(k=0;k<=l&&w[k++]==m[(i+k*y)*c+j+k*x];)k>l?printf(
"row %i, column %i --> row %i, column %i\n",i+1,j+1,i+y*l+1,j+x*l+1):0;}

It compiles without warnings on gcc with no additional flags.

Here's the version with whitespace:

#include<stdio.h>
main() {
    char m[999], w[99];
    int r, c, l = -1, x = -1, y = -1, i = 0, j, k;

    scanf("%s %d %d", w, &r, &c);
    for (; w[l+1]; ++l);
    for (; i < r*c; scanf("%s", &m[i++]));
    for (; y < 2; ++y)
        for (; x < 2; ++x)
            if (x | y)
                for (i = (y<0) * l; i < r - (y>0) * l; ++i)
                    for (j = (x<0) * l; j < c - (x>0) * l; ++j)
                        for (k = 0; k <= l && w[k++] == m[(i+k*y) * c + j+k*x];)
                            k > l ? printf("row %i, column %i --> row %i, column %i\n",
                                    i + 1, j + 1, i + y*l + 1, j + x*l + 1)
                                  : 0;
}

Expected input on stdin looks like:

CODEGOLF 15 15
A I Y R J J Y T A S V Q T Z E
X B X G R Z P W V T B K U F O
...

where 15 15 are the numbers of rows and columns, respectively.

Since this is my first round of golf, and I could find precious little common tricks on the web, I'll list some that I found:

  • Try to keep loop bodies to a single statement to save on moustaches (2 chars).
    Example: anything using the comma operator.

  • Use a conditional expression (?:) instead of an if when you can (1 char).
    Example: instead of if(x)printf("");, write x?printf(""):0;. This works because printf returns int.

  • Use the fact that booleans are 0 or 1 when used as an int (? chars).
    Example: instead of (y>0?r-l:r), write r-(y>0)*l. The savings here are in the parentheses, which were necessary in this situation.

  • while loops are never shorter than for loops, and longer most of the time (>= 0 chars).
    Example: while(x)y; is just as long as for(;x;y); but with for you get the loop body to play with as well, without paying for the extra ;.

  • Put your loop increments into the last expression that uses the loop counter (1 char).
    Example: instead of for(;x<b;++x)printf("%i",x);, write for(;x<b;)printf("%i",x++);.

  • Leave off main's return type (4 chars).

  • Leave off main's return statement (9 chars).

  • Use & and | instead of && and || whenever possible (1 char).
    Example: instead of if(x||y), write if(x|y). This only works if you don't depend on the short-circuit behaviour of && and ||.

  • Inline the strlen function and remove #include<string.h> (11 chars).

If you don't care about compiler warnings:

  • Do not include any headers (many chars).
Lister answered 2/4, 2010 at 17:25 Comment(2)
You could also use the #define macro if it's permitted to shorten the code. (e.g. #define f(x) for(; x) or something more sophisticated)Eliaeliades
@tomp: I considered that, but I couldn't find any rules about it. I found it felt too much like cheating.Lister
M
0

Python: 491 - 353 characters

Still quite some room for improvement I guess, all comments welcome.

Strategy: finding all occurences of the first letter, and then trying to fetch the word in all directions.

import sys;r=range(-1,2);k=''.join;q="row %s, column %s"
a=[l[:-1:2]for l in sys.stdin]
b=sys.argv[1];c=len(a[0])
for x,y in[(x/c,x%c)for x,h in enumerate(k(map(k,a)))]:
 for i,j in[(i,j)for i in r for j in r if i or j<>0]:
  if k([a[x+z*i][y+z*j]for z in range(len(b))if 0<=x+z*i<c and 0<=y+z*j<len(a)])==b:print q%(x+1,y+1)+" --> "+q%(x+z*i+1,y+z*j+1)

Example usage:

cat input.txt | python test.py CODEGOLF

Example output:

row 12, column 8 --> row 5, column 1

Music answered 2/4, 2010 at 17:25 Comment(1)
You can leave off the if h==b[0]. You're testing that again in the inner loop anyway.Lister

© 2022 - 2024 — McMap. All rights reserved.