Writing a program that solves crosswords efficiently [closed]
Asked Answered
F

5

13

I have a crossword puzzle and a list of words which can be used to solve it (words can be placed multiple times or not even once). There is always a solution for the given crossword and word list.

I searched for clues on how to solve this problem and found out that it is NP-Complete. My maximal crossword size is 250 by 250, the maximal length of the list (amount of words which can be used to solve it) is 200. My goal is to solve crosswords of this size by brute force/backtracking, which should be possible within a few seconds (this is a rough estimation by me, correct me if I am wrong).

For example:

A list of given words which can be used to solve the crossword:

  • can
  • music
  • tuna
  • hi

The given empty crossword (X are fields which cannot be filled out, the empty fields need to be filled):

An empty crossword which needs to be solved

The solution:

The solution of the problem above

Now my current approach is to represent the crossword as a 2-D array and search for empty spaces (2 iterations over the crossword). Then I match words to empty spaces depending on their length, then I try all combinations of words to empty spaces which have the same length. This approach got very messy very fast, I got lost trying to implement this, is there a more elegant solution?

Freckly answered 17/6, 2017 at 16:40 Comment(10)
A few seconds seems a bit optimistic for brute force for input data of that size. A back-tracking approach seems better than brute force, but even with back-tracking you are searching through a potentially enormous tree.Eskridge
You are right, a backtracking approach will be necessary and the tree/search space is large.Freckly
Backtracking might be a good start. The next more powerful approaches are probably SAT-solvers (ugly transformation needed) and Constraint-programming (easier to formulate). The latter can benefit from powerful global-constraints.Whippoorwill
@Whippoorwill what ugly transformation are you referring to?Freckly
@AnnaVopureta Think about how to model this problem in conjunctive normal-form. You will see, that there is some trouble (and a lot of decisions involved). Simple example (not necessarily needed in your problem): Model k>=5 of 100 variables (5 or more are true). This is non-trivial in CNF-form. In CP, it's a one-liner.Whippoorwill
@Whippoorwill Creating a CNF for my problem does seem messy. I'm not interested in "heavy" CP libraries to solve this problem which is AFAIK required in order to solve this with one line using java.Freckly
For brute force, you can possibly just copy the code from Recursive backtracking in Java for solving a crossword (there's a link to a presumably working complete program in the comments).Hath
@AnnaVopureta, the OP in that link is my question and it's not fast enough, I got time limit. Greatings from TUM :DYan
Possible duplicate of Recursive backtracking in Java for solving a crosswordYan
@Yan that does not solve my problem, the solution is too naive (bad runtime).Freckly
Z
5

The basic idea you have is pretty sensible:

  1. Identify slots on the board.
  2. Try each slot with each word that fits.
  3. If every slots can be filled without conflict, it is solved.

It's an excellent plan. The next step is to translate it into a design. For small program like this we can go straight to pseudo code. The gist of it, as explained by other answers, is recursion:

1  Draw a slot from the slot pool.
2     If slot pool is empty (all slots filled), stop solving.
3  For each word with correct length:
4     If part of the slot is filled, check conflict.
5        If the word does not fit, continue the loop to next word.
      // No conflict
6     Fill the slot with the word.
      // Try next slot (down a level)
7     Recur from step 1.
8     If the recur found no solution, revert (take the word back) and try next.
   // None of them works
9  If no words yield a solution, an upper level need to try another word.
   Revert (put the slot back) and go back.

Below is a short but complete example that I cooked up from your requirements.

There is more than one way to skin a cat. My code swapped step 1 and 2, and combines step 4 to 6 in one fill loop.

Key points:

  • Use a formatter to fit the code to your style.
  • The 2D board is stored in a linear character array in row-major order.
  • This allow the board to be save by clone() and restored by arraycopy.
  • On creation, the board is scanned for slots in two passes from two directions.
  • The two slot lists are solved by the same loop, differ mainly in how the slots are filled.
  • The recur process is displayed, so you can see how it works.
  • Many assumptions are made. No single letter slot, all words in same case, board is correct etc.
  • Be patient. Learn whatever is new and give yourself time to absorb it.

Source:

import java.awt.Point;
import java.util.*;
import java.util.function.BiFunction;
import java.util.function.Supplier;
import java.util.stream.Stream;

public class Crossword {

   public static void main ( String[] args ) {
      new Crossword( Arrays.asList( "5 4 4\n#_#_#\n_____\n#_##_\n#_##_\ntuna\nmusic\ncan\nhi".split( "\n" ) ) );
      new Crossword( Arrays.asList( "6 6 4\n##_###\n#____#\n___#__\n#_##_#\n#____#\n##_###\nnice\npain\npal\nid".split( "\n" ) ) );
   }

   private final int height, width; // Board size
   private final char[] board; // Current board state.  _ is unfilled.  # is blocked.  other characters are filled.
   private final Set<String> words; // List of words
   private final Map<Point, Integer> vertical = new HashMap<>(), horizontal = new HashMap<>();  // Vertical and horizontal slots

   private String indent = ""; // For formatting log
   private void log ( String message, Object... args ) { System.out.println( indent + String.format( message, args ) ); }

   private Crossword ( List<String> lines ) {
      // Parse input data
      final int[] sizes = Stream.of( lines.get(0).split( "\\s+" ) ).mapToInt( Integer::parseInt ).toArray();
      width = sizes[0];  height = sizes[1];
      board = String.join( "", lines.subList( 1, height+1 ) ).toCharArray();
      words = new HashSet<>( lines.subList( height+1, lines.size() ) );
      // Find horizontal slots then vertical slots
      for ( int y = 0, size ; y < height ; y++ )
         for ( int x = 0 ; x < width-1 ; x++ )
            if ( isSpace( x, y ) && isSpace( x+1, y ) ) {
               for ( size = 2 ; x+size < width && isSpace( x+size, y ) ; size++ ); // Find slot size
               horizontal.put( new Point( x, y ), size );
               x += size; // Skip past this horizontal slot
            }
      for ( int x = 0, size ; x < width ; x++ )
         for ( int y = 0 ; y < height-1 ; y++ )
            if ( isSpace( x, y ) && isSpace( x, y+1 ) ) {
               for ( size = 2 ; y+size < height && isSpace( x, y+size ) ; size++ ); // Find slot size
               vertical.put( new Point( x, y ), size );
               y += size; // Skip past this vertical slot
            }
      log( "A " + width + "x" + height + " board, " + vertical.size() + " vertical, " + horizontal.size() + " horizontal." );
      // Solve the crossword, horizontal first then vertical
      final boolean solved = solveHorizontal();
      // Show board, either fully filled or totally empty.
      for ( int i = 0 ; i < board.length ; i++ ) {
         if ( i % width == 0 ) System.out.println();
         System.out.print( board[i] );
      }
      System.out.println( solved ? "\n" : "\nNo solution found\n" );
   }

   // Helper functions to check or set board cell
   private char get ( int x, int y ) { return board[ y * width + x ]; }
   private void set ( int x, int y, char character ) { board[ y * width + x ] = character; }
   private boolean isSpace ( int x, int y ) { return get( x, y ) == '_'; }

   // Fit all horizontal slots, when success move to solve vertical.
   private boolean solveHorizontal () {
      return solve( horizontal, this::fitHorizontal, "horizontally", this::solveVertical );
   }
   // Fit all vertical slots, report success when done
   private boolean solveVertical () {
      return solve( vertical, this::fitVertical, "vertically", () -> true );
   }

   // Recur each slot, try every word in a loop.  When all slots of this kind are filled successfully, run next stage.
   private boolean solve ( Map<Point, Integer> slot, BiFunction<Point, String, Boolean> fill, String dir, Supplier<Boolean> next ) {
      if ( slot.isEmpty() ) return next.get(); // If finished, move to next stage.
      final Point pos = slot.keySet().iterator().next();
      final int size = slot.remove( pos );
      final char[] state = board.clone();
      /* Try each word */                                                   indent += "  ";
      for ( String word : words ) {
         if ( word.length() != size ) continue;
         /* If the word fit, recur. If recur success, done! */              log( "Trying %s %s at %d,%d", word, dir, pos.x, pos.y );
         if ( fill.apply( pos, word ) && solve( slot, fill, dir, next ) )
            return true;
         /* Doesn't match. Restore board and try next word */               log( "%s failed %s at %d,%d", word, dir, pos.x, pos.y );
         System.arraycopy( state, 0, board, 0, board.length );
      }
      /* No match.  Restore slot and report failure */                      indent = indent.substring( 0, indent.length() - 2 );
      slot.put( pos, size );
      return false;
   }

   // Try fit a word to a slot.  Return false if there is a conflict.
   private boolean fitHorizontal ( Point pos, String word ) {
      final int x = pos.x, y = pos.y;
      for ( int i = 0 ; i < word.length() ; i++ ) {
         if ( ! isSpace( x+i, y ) && get( x+i, y ) != word.charAt( i ) ) return false; // Conflict
         set( x+i, y, word.charAt( i ) );
      }
      return true;
   }
   private boolean fitVertical ( Point pos, String word ) {
      final int x = pos.x, y = pos.y;
      for ( int i = 0 ; i < word.length() ; i++ ) {
         if ( ! isSpace( x, y+i ) && get( x, y+i ) != word.charAt( i ) ) return false; // Conflict
         set( x, y+i, word.charAt( i ) );
      }
      return true;
   }
}

Exercise: You can rewrite recursion to iteration; faster and can support bigger boards. Once that's done it can be converted to multi-thread and run even faster.

Zoroastrian answered 23/6, 2017 at 20:29 Comment(0)
S
1

You are right the problem is NP-complete. So your best chance is to solve it by brute-force (if you find a polynomial algorithm please tell me, we can both be rich =)).

What I suggest you is to take a look at backtracking. It will allow you to write an elegant (and yet slow given your input size) solution to the crossword problem.

If you need more inspirational material take a look at this solver that uses backtracking as a method to navigate the solution tree.

Note that there are algorithms out there that might in practice perform better than a pure brute-force (even though still of exponential complexity). Also, a quick search on scholar reveals a good number of papers on the topic that you might want to take a look at, such as the followings:

Sabu answered 17/6, 2017 at 16:54 Comment(11)
"You are right the problem is NP-complete. So your best chance is to solve it by bruteforce". That doesn't make a lot of sense. The travelling salesman problem is NP-complete but there are algorithms in existence which are much better than brute-force (even though asymptotically they are still exponential).Eskridge
Yes I know, I did not want to say that bruteforcing is the only way to solve the problem. By "your best chance" I meant that a simple backtracking would be correct and farily easy to implement (since she's asking for help on SO I assumed that would be a good starting point). I will update the answer to make it clearer=). Good point though, Thanks.Sabu
Well, they are only better than brute-force in a non-general setting with additional assumptions. I don't consider that sentence wrong (theory-wise).Whippoorwill
The approaches in the mentioned papers are overkill for my purposes, a "correct and fairly easy to implement" backtracking solution is fine.Freckly
@Whippoorwill I see what you mean, but "so" means "implies". Even if P != NP it isn't true that if a problem is NP complete then there isn't anything significantly better than brute force. Even theory-wise, it isn't a valid implication. The only thing that theory implies is that if P != NP then there isn't any polynomial-time algorithm, but the set of non-polynomial time algorithms isn't exhausted by brute-force.Eskridge
@Davide Spataro The implementation behind your first link assumes that I have knowledge about the starting point and orientation of the blanks in the crossword :/Freckly
@JohnColeman I would be careful with that implication. Of course it's valid when we look at practical cases and exploit additional structure, but for the general case (all instances) the no free lunch theorem should apply.Whippoorwill
@Anna I guess that it is not hard to search on Google for the code that solves your exactly your problem. What I have linked uses backtracking to solve a very similar problem and that is enough to point you to the way I would start tackling the problem.Sabu
I tried finding a solution which suits my needs but couldn't find anything. The link you provided is the first thing that comes up, a few others use the exact same approach (also require information about the starting point and orientation of the blanks). There is no solution to my problem as far as I know.Freckly
@DavideSpataro The problem seems much more difficult than I imagined it to be. No wonder there are no already available solutions to it, even tho it's pretty general "solve a given crossword with a given list of words". :/Freckly
Can you provide some sample input and output please?Sabu
S
1

A crossword puzzle is a Constraint satisfaction problem which is generally a NP-Complete, but there are many solvers that will apply the most efficient algorithms to a constraint problem that you specify. The Z3 SMT solver can solve these problems very easily and at scale. All you have to do is write a Java program that transforms the crossword puzzle into a SMT problem the solver can understand then gives it to the solver to solve it. Z3 has Java bindings so it should be pretty simple. I have written the Z3 code for solving the first example below. It should not be difficult for you to follow the pattern in your Java program to specify arbitrarily large crossroad puzzles.

; Declare each possible word as string literals
(define-const str1 String "tuna")
(define-const str2 String "music")
(define-const str3 String "can")
(define-const str4 String "hi")

; Define a function that returns true if the given String is equal to one of the possible words defined above.
(define-fun validString ((s String)) Bool 
    (or (= s str1) (or (= s str2) (or (= s str3) (= s str4)))))

; Declare the strings that need to be solved
(declare-const unknownStr1 String)
(declare-const unknownStr2 String)
(declare-const unknownStr3 String)
(declare-const unknownStr4 String)

; Assert the correct lengths for each of the unknown strings.
(assert (= (str.len unknownStr1) 4))
(assert (= (str.len unknownStr2) 5))
(assert (= (str.len unknownStr3) 3))
(assert (= (str.len unknownStr4) 2))

; Assert each of the unknown strings is one of the possible words.
(assert (validString unknownStr1))
(assert (validString unknownStr2))
(assert (validString unknownStr3))
(assert (validString unknownStr4))

; Where one word in the crossword puzzle intersects another assert that the characters at the intersection point are equal.
(assert (= (str.at unknownStr1 1) (str.at unknownStr2 1)))
(assert (= (str.at unknownStr2 3) (str.at unknownStr4 1)))
(assert (= (str.at unknownStr2 4) (str.at unknownStr3 0)))

; Solve the model
(check-sat)
(get-model)

I recommend the Z3 SMT solver, but there are plenty of other constraint solvers. There is no need for you to implement your own constraint solving algorithm any more than there is a need for you to implement your own sorting algorithm.

Substrate answered 22/6, 2017 at 18:23 Comment(0)
P
1

To make this problem easier to solve, I'll break this down into smaller, easier problems. Note that I am not including code/algorithms, as I believe that will not help here (If we wanted the best Code, there would be indexes and databases and black magic that makes your head explode just seeing it). Instead, this answer tries to answer the question by talking about methods of thought that will help the OP tackle this problem (and future ones) using the method that works best for the reader.

What you need to know

This answer assumes you know how to do the following

  • Create and use Objects that have properties and functions
  • Pick a data structure that works (not necessarily good) for what you want to do with its contents.

Modeling your space

So, it's easy enough to load your crossword into an n by m matrix (2D array, hereby 'grid'), but this is very heard to work with pragmatically. So lets start by parsing your crossword from a grid to a legitimate object.

As far as your program needs to know, each entry in the crossword has 4 properties.

  1. An X-Y coordinate in the grid for the first letter
  2. A direction (down or across)
  3. Word length
  4. Word value
  5. Map of bound indexes
    • Key: Index of word that is shared with another entry
    • Value: Entry that index is shared with
    • (You can make this a tuple and include the shared index from the other entry for easy refrence)

You can find these in the grid based on these rules while scanning.

  1. If Row_1_up is closed and Row_1_down is open, this is the start index of a down word. (scan down for for length. For bound indexes, left or right space will be open. scan left to get linked entry coord-id)
  2. Same as 1 but rotated for across words (You can do this at the same time as the scan for 1)

In your crossword object, you can store the entries using the coordinate+direction as the key for easy reference and easy conversion to/from text grid form.

Using your model

You should now have an object containing a collection of crossword entries, which contain their relevant index bindings. You now need to find a set of values that will satisfy all your entries.

Your entry objects should have helper methods like isValidEntry(str) that checks for the given value, and the current state of the crossword, can I put this word here? By making each object in your model responsible for its own level of logic, the code for the problem one thought layer up can just call the logic without worrying about it's implementation (in this example, Your solver doesn't have to worry about the logic of is a value valid, it can just ask isValidEntry for that)

If you have done the above right, solving the problem is then a simple matter of iterating over all words for all entries to find a solution.

List of sub problems

For reference, here is my list of sub problems that you need to write something to solve.

  • How can I ideally model my work-space that is easy for me to work with?
  • For each piece of my model, what does it need to know? What logic can it handle for me?
  • How can I transform my text input into a usable model object?
  • How do I solve my problem using my model objects? (For you, it is iterate all words/all entries to find a valid set. Maybe using recursion)
Parmenides answered 22/6, 2017 at 19:32 Comment(0)
P
0

I just implemented a code in Scala to solve such puzzles. I am just using recursion to solve the problem. In short, for each word, I find all possible slots, and pick a slot and fill it with the word, and try to solve the partial puzzle with recursion. If the puzzle cannot be filled with the rest of words, it tries another slot, etc. if not, the puzzle is solved.

Here is the link to my code: https://github.com/mysilver/AMP/blob/master/Crossword.scala

Papillote answered 5/6, 2020 at 0:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.