Empty crossword solver in Python
Asked Answered
U

1

9

I'm given a matrix containing a blueprint of a crossword puzzle - unfilled, of course. The goal is to fill the whole puzzle - it's a task from Checkio, and I've been struggling with this for quite some time now.

From what I understand of complexity, there's no perfect algorithm for this problem. Still, there has to be the best way to do this, right? I've tried some different things, and results were not that good with increasing number of words in the crossword and/or dictionary.

So, some of the things I've tried:

  • simple brute forcing. Did not work at all, as it has been ignoring and overwriting intersections.
  • brute forcing while keeping all the relevant data - worked as expected with a specific dictionary, turned to hell with a moderately big one even with word-length optimization. Figures.
  • blind intersection filling - the idea where I thought it would be better not to bother with the intersecting words, instead focusing on the letters. Like start with As and check if you can fill the whole crossword with these restrictions. If it did not work for some word, increment one of the letters and try the whole thing again. Results were abysmal as you can expect.
  • recursive exploring - worked perfectly on more simple blueprints, but fell flat with more complex ones. There was an issue with simple loops which was resolved simply enough, but I did not find a reasonable solution for the situation where the path splits and then rejoins several further splits later (so there's nothing left to solve for the second branch, but it doesn't know that).
  • minimizing intersections - haven't tested this yet, but it looks promising. The idea is that I find the shortest list of words containing all intersections... that also don't intersect with each other. Then I can just use a generator for each of those words, and then check if the depending words with those intersections exist. If they don't, I just grab the next word from a generator.

And this is where I'm currently at. I decided to ask about this here as it's already at that point where I think it took more time than it should have, and even then my latest idea may not even be the proper way to do it.

So, what is the proper way to do it?

Edit: Input is a list of strings representing the crossword and a list of strings representing the dictionary. Output is a list of strings representing the filled crossword.

And example of a crossword:

['...XXXXXX', 
 '.XXX.X...', 
 '.....X.XX', 
 'XXXX.X...', 
 'XX...X.XX', 
 'XX.XXX.X.', 
 'X......X.', 
 'XX.X.XXX.', 
 'XXXX.....']

the crossword

The output would be a similar list with filled letters instead of dots.

Note that the 'dictionary' is just that, a small English dictionary and not a list of words fitted as answers for this puzzle.

Unmanned answered 31/8, 2017 at 10:13 Comment(8)
Well thanks for the description, could you also add a puzzle example ? Moreover, I'm not sure I compltely got it. You have a crossword puzzle, and do you have a list of words that are filling in this puzzle, and are you looking to where you should place them ? Or am I completely wrong ? ^^Wintertide
"(...) there's no optimal solution (...) there has to be the best way to do this, right?", well, the "best way to do this" is literally the definition of "optimal solution". What may not exist is a "good algorithm", understood as an algorithm with polynomial worst-case complexity. In any case, you have not provided a proper description of your problem (inputs, expected output, an example...).Hammerlock
Thanks for the example, but I just can't see the words intersection on it. I still don't see a crossword there sry.Wintertide
Give me a minute, I'll make a visual representation of that input.Unmanned
To reduce the time consuming process, you could try to improve the way you choose word by looking into the repartition of letters in English. Which are the letters the more used ? Which are the most common first letter ? Last letter ? By improving the word selection, you'll have less mistakes, and then a quicker resolution.Wintertide
@Wintertide Yes, that would definitely be an improvement. However, since intersections do not appear at the set places all the time, I find it to be practically impossible to sort the dictionary beforehand for it to work on the whole puzzle. And dynamic implementation that would sort the thing depending on the needs right now would be probably just as complex as the main algorithm. So while definitely a right way to go for further improvement, I don't think it's something I need to look into right now.Unmanned
If you just try to solve it, Klas Lindback approach is correct and probably the best I can think of. The only flaw as you saw, is that a mistake on the last branches of the tree could make you come back to the beginning. That's why I say it could be really long. But if you manage to choose wisely the word you try out, you'll have less mistakes.Wintertide
The difficulty depends on whether (a) the grid is always a tree; (b) words are allowed to be used in multiple places. If both are true, there's an O(ds) algorithm where d is the number of words in the dictionary and s is the number of empty squares in the grid. You do one O(ds) post-order pass finding possibilities, so that for every intersection-square you know what letters and words could possibly go there; then you write in one word in the middle; and then you do one fast O(s) pre-order pass writing in the possibilities you already found.Caballero
G
4

So, what is the proper way to do it?

I don't know if it is optimal, but I would be using the principles of Floodfill.

Data structures:

Crossword words and their intersections. Sort them by the number of words in the dictionary for the corresponding word length. This will most likely mean that you will start with one of the longest words.

Dictionary accessible by word length.

If the dictionary is large it would be beneficial to be able to quickly find words of a certain length with a specific n:th letter, where n corresponds to an intersection position.

Note that for each crossword word, any two words that fit and have the same letters in all intersections are equivalent. Thus, it is possible to select a subset from the dictionary for each crossword word. The subset is the set of equivalence classes. So for each crossword word you can create a subset of the dictionary that contains at most [the number of letters in the alphabet] to the power of [the number of intersections]. This subset would constitute the equivalence classes that might fit a particular crossword word.

Algorithm:

  • Take the first/next unsolved crossword word. Assign it the first/next word that fits.

  • Take the first/next intersection. Assign the other crossword word the first word that fits.

  • If there are no more intersections to follow onwards, go back to the intersection you came from and continue with the next intersection.
  • If there is no word in the dictionary that fits, backtrack one intersection and search for the next word that fits.
Guayule answered 31/8, 2017 at 11:29 Comment(4)
Would be extremely long if the dictionnary is a a bit too big.Wintertide
@Wintertide It would. But note that the dictionary can be shrunk into equivalence classes. If you need an n letter word with one intersection, then there are only 26 equivalence classes, one for each possible letter in the intersection. I'll add that to the solution!Maxilliped
The issue that I see with this method is that its starting point is so vulnerable. For example, if initial word has 3 separate branches and you go filling them one by one, when you get to the third one, if there is no word available to fit the criteria, you have to change that initial word and redo everything again. This actually made me try the mentioned recursive exploring - its whole idea revolved about trying to make a straight line for as long as possible, so if any error in matching happened, you could fix it right away - it would not affect the previous matches.Unmanned
@Unmanned When your approach fails, you may have to backtrack a long bit. If you have made a selection that ultimately fails, then you want to fail as fast as possible. So you should start with a crossword word that has many ways to fail and few words that match..Maxilliped

© 2022 - 2024 — McMap. All rights reserved.