Crosswords in Mathematica using Pattern Matching
Asked Answered
P

2

12

Suppose I select all 3 char words from the Mathematica dictionary:

all3 = Characters /@ Select[DictionaryLookup[], StringLength[#] == 3 &];  

and I want to form full scrabble-like sets, like:

A B E
R A Y
E R E  

Where the words can be read horizontally and vertically.

Clearly, the sets can be found with recursion and backtracking. But:

1) Is there a way to solve it using patterns?
2) For which dimensions are there valid solutions?

Edit

I wrote the question for DictionaryLookup[] just because it's a reasonable sized database of variable length records. My real problem is not related to Dictionary lookups but to a certain kind of loom patterns.

Plaster answered 1/2, 2011 at 20:39 Comment(0)
S
11

I am not sure if you would consider the following approach pattern based -- but it works, and it can conceivably be extended to many dimensions, although with the all3 dataset, it would probably konk out rather early...

The idea is to start with a blank crossword:

blankCW={{_,_,_},{_,_,_},{_,_,_}};

and then recursively do the following: For a given pattern, look at the rows in turn and (after filling out any with exactly one completion) expand the pattern on the row with the fewest number of matches:

(* Cache the number of matches for a given pattern *)
nmatch[patt_]:=nmatch[Verbatim@patt]=Length@Cases[all3,patt]

(* A helper to fill single matches if needed *)
fixone[ml_,nl_]:=If[FreeQ[ml[[nl]],Verbatim[_]],ml, 
  ReplacePart[ml, nl->First@Cases[all3,ml[[nl]]]]];

findCompletions[m_]:=Module[{nn,ur},
  (* Pattern w/ filled single matches -> ur, ordering by # of matches -> nn *)
  {ur,nn}=NestWhile[{fixone[#[[1]],First@#[[2]]], Rest@#[[2]]}&,
    {m,Ordering[nmatch/@m]},
    (Length[#[[2]]]>0&&nmatch@#[[1,#[[2,1]]]]==1)&];

  (* Expand on the word with the fewest number og matches *)
  If[Length[nn]==0,{ur},
    With[{n=First@nn},ReplacePart[ur,n-> #]&/@Cases[all3,ur[[n]]]]]];

For a given candidate pattern, try out the completion along both dimensions and keep the one that yield the fewest:

findCompletionsOriented[m_]:=Module[{osc},
  osc=findCompletions/@Union[{m,Transpose@m}];
  osc[[First@Ordering[Length/@osc,1]]]]

I do the recursion breadth first to be able to use Union, but depth first might be necessary for bigger problems. Performance is so-so: 8 laptop minutes to find the 116568 matches in the example problem:

Timing[crosswords=FixedPoint[Union[Join@@(findCompletionsOriented/@#)]&,{blankCW}];]
Length@crosswords
TableForm/@Take[crosswords,5]

Out[83]= {472.909,Null}
Out[84]= 116568
          aah aah aah aah aah
Out[86]={ ace ace ace ace ace }
          hem hen hep her hes

In principle, it should be possible to recurse this into higher dimensions, i.e. using the crosswords list instead of the wordlist for dimension 3. If the time to match a pattern against a list is linear in the list-length, this would be quite slow with a 100000+ sized wordlist...

Selfabsorbed answered 3/2, 2011 at 2:28 Comment(7)
@Selfabsorbed Nice! Is there an easy way to cut off the number of solutions to find? I want to do a test drive for higher dimensions, but I'd like to evaluate how long it may take first :)Plaster
@belisarius: Maybe the simplest is to truncate the wordlist as Yaroslav does? Also, watch out for the size of the nmatch cache: you will run out of space at some point :)Selfabsorbed
@Selfabsorbed The problem with that approach is that I'll not get easily the idea of what will happen with the full set. Will try, though. Thanks!Plaster
@belisarius: Thinking about it, I actually think that for higher dimensions, it is too expensive to count matches: Just fill one of the most constrained words, depth first. With a fast FindFirstMatchAfter implementation, this might actually work...Selfabsorbed
@belisarius. Another reduced version might be to truncate the result list after each step in the fixed point search. (Take[#,Min[Length@#,1000]]&)@Union[... instead of just Union in the call to FixedPoint.Selfabsorbed
@Selfabsorbed I guess you may try to answer this one #5087372 with a variation of your algorithmPlaster
@Janus, @belisarius -- matlab had a crossword contest recently along these lines -- mathworks.com/matlabcentral/contest/contests/32/rulesDannielledannon
D
8

An alternative approach is to use SatisfiabilityInstances with constraints specifying that every row and every column must be a valid word. Code below takes 40 seconds to get first 5 solutions using dictionary of 200 three-letter words. You could replace SatisfiabilityInstances with SatisfiabilityCount to get the number of such crosswords.

setupCrossword[wordStrings_] := (
   m = Length[chars];

   words = Characters /@ wordStrings;
   chars = Union@Flatten@words;

   wordMatch[vars_, word_] := And @@ (Thread[{vars, word}]);
   validWord[vars_] := Or @@ (wordMatch[vars, #] & /@ words);
   validCell[{i_, j_}] := 
    BooleanCountingFunction[{1}, {{i, j}, #} & /@ chars];

   row[i_] := {i, #} & /@ Range[n];
   col[i_] := {#, i} & /@ Range[n];
   cells = Flatten[row /@ Range[n], 1];

   rowCons = validWord[row[#]] & /@ Range[n];
   colCons = validWord[col[#]] & /@ Range[n];
   cellCons = validCell /@ cells;
   formula = And @@ (Join[rowCons, colCons, cellCons]);
   vars = 
    Table[{{i, j}, c}, {i, 1, n}, {j, 1, n}, {c, chars}] // 
     Flatten[#, 2] &;
   decodeInstance[instance_] := (
     choices = Extract[vars, Position[instance, True]];
     grid = Table[{i, j}, {i, 1, n}, {j, 1, n}] /. Rule @@@ choices
     )
   );

n = 3;
wordLimit = 200;
wordStrings = 
  Select[DictionaryLookup[], 
   StringLength[#] == n && LowerCaseQ[#] &];
setupCrossword[wordStrings[[;; wordLimit]]];

vals = SatisfiabilityInstances[formula, vars, 5];
Framed@TableForm@decodeInstance@# & /@ vals


(source: yaroslavvb.com)

This approach uses variables like {{i,j},"c"} to indicate the cell {i,j} gets letter "c". Each cell is constrained get exactly one letter with BooleanCountingFunction, every row and column is constrained to make a valid word. For instance, constraint that first row must be either "ace" or "bar" looks like this

{{1,1},"a"}&&{{1,2},"c"}&&{{1,3},"e"}||{{1,1},"b"}&&{{1,2},"a"}&&{{1,3},"r"}
Dannielledannon answered 1/2, 2011 at 22:41 Comment(4)
Thanks for your effort! I never used SatisfiabilityInstances before, although I saw you used it in those nice tetrahedron problems you used to post. I guess this one will take a me some time to chew :DPlaster
Nice idea! I think the pattern matching is a dead end: even with dispatch tables I couldn't check more than a million candidates per second -- which would mean more than an hour for the whole problem.Selfabsorbed
@Selfabsorbed @Yaro Using this one, for four chars words, the first solution takes 11 minutes. BTW it contains the word burpPlaster
The solution is nice and I learned a lot from it. However the time complexity of the algorithm seems too high to be practical for my purposes (dimension 8 to 10)Plaster

© 2022 - 2024 — McMap. All rights reserved.