Algorithm to generate a crossword [closed]
Asked Answered
J

13

128

Given a list of words, how would you go about arranging them into a crossword grid?

It wouldn't have to be like a "proper" crossword puzzle which is symmetrical or anything like that: basically just output a starting position and direction for each word.

Jud answered 3/6, 2009 at 4:52 Comment(0)
J
67

I came up with a solution which probably isn't the most efficient, but it works well enough. Basically:

  1. Sort all the words by length, descending.
  2. Take the first word and place it on the board.
  3. Take the next word.
  4. Search through all the words that are already on the board and see if there are any possible intersections (any common letters) with this word.
  5. If there is a possible location for this word, loop through all the words that are on the board and check to see if the new word interferes.
  6. If this word doesn't break the board, then place it there and go to step 3, otherwise, continue searching for a place (step 4).
  7. Continue this loop until all the words are either placed or unable to be placed.

This makes a working, yet often quite poor crossword. There were a number of alterations I made to the basic recipe above to come up with a better result.

  • At the end of generating a crossword, give it a score based on how many of the words were placed (the more the better), how large the board is (the smaller the better), and the ratio between height and width (the closer to 1 the better). Generate a number of crosswords and then compare their scores and choose the best one.
    • Instead of running an arbitrary number of iterations, I've decided to create as many crosswords as possible in an arbitrary amount of time. If you only have a small word list, then you'll get dozens of possible crosswords in 5 seconds. A larger crossword might only be chosen from 5-6 possibilities.
  • When placing a new word, instead of placing it immediately upon finding an acceptable location, give that word location a score based on how much it increases the size of the grid and how many intersections there are (ideally you'd want each word to be crossed by 2-3 other words). Keep track of all the positions and their scores and then choose the best one.
Jud answered 20/6, 2009 at 15:2 Comment(4)
I happen to be writing this program as we speak, and this is the identical algorothm I chose also. For small numbers of words (10 or less), the program has no trouble calculated all possible solutions in milliseconds. The algoritm is exponential though. The easy part is writing the basic algorithm that brute-forces through all possible combinations. The hard part is the dozen or so 'short-cuts' you need to prevent the program from trying all the dead-end solutions.Thegn
"5. ... and check if the new word interferes" How do you account for situations where the new word is place alongside an existing word, which generates gibberish in the places where it has adjacent squares? E.g.: LEMON ERASE If "LE", "ER" and "MA" etc. are not words in your list, this is wrong. On the other hand, outright rejecting such adjacencies might throw away really good grids, like: W LEMON ERASE NEXUS T TAccusal
@Kaffeine, yep I know what you mean - I had to throw out these options because even though they could create really good grids, it's too hard to check (read: I couldn't be bothered), and chances are it's just interference anyway.Jud
Implemented essentially the same approach here: colab.research.google.com/drive/…Huge
J
24

I just recently wrote my own in Python. You can find it here: http://bryanhelmig.com/python-crossword-puzzle-generator/. It doesn't create the dense NYT style crosswords, but the style of crosswords you might find in a child's puzzle book.

Unlike a few algorithms I found out there that implemented a random brute-force method of placing words like a few have suggested, I tried to implement a slightly smarter brute-force approach at word placement. Here's my process:

  1. Create a grid of whatever size and a list of words.
  2. Shuffle the word list, and then sort the words by longest to shortest.
  3. Place the first and longest word at the upper left most position, 1,1 (vertical or horizontal).
  4. Move onto next word, loop over each letter in the word and each cell in the grid looking for letter to letter matches.
  5. When a match is found, simply add that position to a suggested coordinate list for that word.
  6. Loop over the suggested coordinate list and "score" the word placement based on how many other words it crosses. Scores of 0 indicate either bad placement (adjacent to existing words) or that there were no word crosses.
  7. Back to step #4 until word list is exhausted. Optional second pass.
  8. We should now have a crossword, but the quality can be hit or miss due to some of the random placements. So, we buffer this crossword and go back to step #2. If the next crossword has more words placed on the board, it replaces the crossword in the buffer. This is time limited (find the best crossword in x seconds).

By the end, you have a decent crossword puzzle or word search puzzle, since they are about the same. It tends to run rather well, but let me know if you have any suggestions on improvement. Bigger grids run exponentially slower; bigger word lists linearly. Bigger word lists also have a much higher chance at better word placement numbers.

Jarnagin answered 11/4, 2010 at 18:41 Comment(4)
@Neil N: Probably a better letter matching possibility for the other words. Would be maybe also an approach to sort by the count of distinct letters contained per word, which will mostly lead to the same result.Chevrotain
@NeilN Python's array.sort(key=f) is stable, which means (for example) that simply sorting an alphabetical word list by length would keep all the 8-letter words alphabetically sorted.Satirize
@Bryan, Your website link doesn't work for me, and the primary domain just redirects to Twitter. Do you have an updated link to your code?Tomfool
Here is (apparently) a clone of Bryan's generator: github.com/jeremy886/crossword_helmigDosimeter
C
20

I actually wrote a crossword generation program about ten years ago (it was cryptic but the same rules would apply for normal crosswords).

It had a list of words (and associated clues) stored in a file sorted by descending usage to date (so that lesser-used words were at the top of the file). A template, basically a bit-mask representing the black and free squares, was chosen randomly from a pool that was provided by the client.

Then, for each non-complete word in the puzzle (basically find the first blank square and see if the one to the right (across-word) or the one underneath (down-word) is also blank), a search was done of the file looking for the first word that fitted, taking into account the letters already in that word. If there was no word that could fit, you just marked the whole word as incomplete and moved on.

At the end would be some uncompleted words which the compiler would have to fill in (and add the word and a clue to the file if desired). If they couldn't come up with any ideas, they could edit the crossword manually to change constraints or just ask for a total re-generation.

Once the word/clue file got up to a certain size (and it was adding 50-100 clues a day for this client), there was rarely a case of more than two or three manual fix ups that had to be done for each crossword.

Competency answered 3/6, 2009 at 5:22 Comment(3)
This doesn't actually help me in my situation, since I'll have a list of only about 6-12 words. Mine is more like an learning exercise for the user than a word puzzle. +1 for the interesting algorithm anyway though!Jud
Nice description. I thought about this a few times in the past, but never tried it. Now for the magic question: how well did it work? Just for sparse puzzles, or also for dense ones (like in the paper)? And how many clues do you need for dense puzzles?Attractant
@dmckee, it was a while ago but from memory, even the dense puzzles were pretty good. Many were completed without intervention but you'd still get maybe a fifth requiring one or two extra words added. And we're talking about thousands of words in the file. No doubt backtracking could have helped but it was easier just for the client to reject one with (e.g.) 5 unfinished words than worry about trying to come up with extra clues. Five was about the outer limit I saw for unfinished words.Competency
B
17

This algorithm creates 50 dense 6x9 arrow crosswords in 60 seconds. It uses a word database (with word+tips) and a board database (with pre-configured boards).

1) Search for all starting cells (the ones with an arrow), store their size and directions
2) Loop through all starting cells
2.1) Search a word
2.1.1) Check if it was not already used
2.1.2) Check if it fits
2.2) Add the word to the board
3) Check if all cells were filled

A bigger word database decreases generation time considerably and some kind of boards are harder to fill! Bigger boards require more time to be filled correctly!


Example:

Pre-Configured 6x9 Board:

(# means one tip in one cell, % means two tips in one cell, arrows not shown)

# - # # - % # - # 
- - - - - - - - - 
# - - - - - # - - 
% - - # - # - - - 
% - - - - - % - - 
- - - - - - - - - 

Generated 6x9 Board:

# C # # P % # O # 
S A T E L L I T E 
# N I N E S # T A 
% A B # A # G A S 
% D E N S E % W E 
C A T H E D R A L 

Tips [line,column]:

[1,0] SATELLITE: Used for weather forecast
[5,0] CATHEDRAL: The principal church of a city
[0,1] CANADA: Country on USA's northern border
[0,4] PLEASE: A polite way to ask things
[0,7] OTTAWA: Canada's capital
[1,2] TIBET: Dalai Lama's region
[1,8] EASEL: A tripod used to put a painting
[2,1] NINES: Dressed up to (?)
[4,1] DENSE: Thick; impenetrable
[3,6] GAS: Type of fuel
[1,5] LS: Lori Singer, american actress
[2,7] TA: Teaching assistant (abbr.)
[3,1] AB: A blood type
[4,3] NH: New Hampshire (abbr.)
[4,5] ED: (?) Harris, american actor
[4,7] WE: The first person of plural (Grammar)
Benford answered 22/7, 2013 at 12:7 Comment(0)
W
11

Although this is an older question, will attempt an answer based on similar work i have done.

There are many approaches to solving constraint problems (which generallay are in NPC complexity class).

This is related to combinatorial optimization and constraint programming. In this case the constraints are the geometry of the grid and the requirement that words are unique etc..

Randomization/Annealing approaches can also work (although within the proper setting).

Efficient simplicity might just be the ultimate wisdom !

The requirements were for a more or less complete crossword compiler and (visual WYSIWYG) builder.

Leaving aside the WYSIWYG builder part, the compiler outline was this:

  1. Load the available wordlists (sorted by word length, ie 2,3,..,20)

  2. Find the wordslots (ie grid words) on the user-constructed grid (eg word at x,y with length L, horizontal or vertical) ( complexity O(N) )

  3. Compute the intersecting points of the grid words (that need to be filled) ( complexity O(N^2) )

  4. Compute the intersections of the words in the wordlists with the various letters of the alphabet used (this allows to search for matching words by using a template eg. Sik Cambon thesis as used by cwc ) ( complexity O(WL*AL) )

Steps .3 and .4 allow to do this task:

a. The intersections of the grid words with themselves enable to create a "template" for trying to find matches in the associated wordlist of available words for this grid word (by using the letters of other intersecting words with this word which are already filled at a certain step of the algorithm)

b. The intersections of the words in a wordlist with the alphabet enable to find matching (candidate) words that match a given "template" (eg 'A' in 1st place and 'B' in 3rd place etc..)

So with these data structures implemented the algorithm used was sth like this:

NOTE: if the grid and the database of words are constant the previous steps can just be done once.

  1. First step of the algorithm is select an empty wordslot (grid word) at random and fill it with a candidate word from its associated wordlist (randomization enables to produce different solutons in consecutive executions of the algorithm) ( complexity O(1) or O(N) )

  2. For each still empty word slots (that have intersections with already filled wordslots), compute a constraint ratio (this can vary, sth simple is the number of available solutions at that step) and sort the empty wordslots by this ratio ( complexity O(NlogN) or O(N) )

  3. Loop through the empty wordslots computed at previous step and for each one try a number of cancdidate solutions (making sure that "arc-consistency is retained", ie grid has a solution after this step if this word is used) and sort them according to maximum availability for next step (ie next step has a maximum possible solutions if this word is used at that time in that place, etc..) ( complexity O(N*MaxCandidatesUsed) )

  4. Fill that word (mark it as filled and go to step 2)

  5. If no word found that satisfies the criteria of step .3 try to backtrack to another candidate solution of some previous step (criteria can vary here) ( complexity O(N) )

  6. If backtrack found, use the alternative and optionally reset any already filled words that might need reset (mark them as unfilled again) ( complexity O(N) )

  7. If no backtrack found, the no solution can be found (at least with this configuration, initial seed etc..)

  8. Else when all wordlots are filled you have one solution

This algorithm does a random consistent walk of the solution tree of the problem. If at some point there is a dead end, it does a backtrack to a previous node and follow another route. Untill either a solution found or number of candidates for the various nodes are exhausted.

The consistency part makes sure that a solution found is indeed a solution and the random part enables to produce different solutions in different executions and also on the average have better performance.

PS. all this (and others) were implemented in pure JavaScript (with parallel processing and WYSIWYG) capability

PS2. The algorithm can be easily parallelized in order to produce more than one (different) solution at the same time

Hope this helps

Wei answered 2/5, 2014 at 19:33 Comment(5)
Is this for creating dense layouts (NY Times like) or sparse layouts?Personage
@Jim, this is mostly for dense layouts but can be adjustd for sparse also. The difference is in dense layouts (e.g classic, scandinavik etc..) one has the grid and searches for words, whereas for freeform layouts (sparse) on has the words and searches for a grid.Wei
Would you happen to have some sample source available somewhere that implements the steps above. For example, I am with you for most of it (and have already implemented most of it independently), but when it comes to "compute a constraint ratio...", I have to admit you lost me. Doing web searches for things like "STH Ratio" isn't much help to me either. The problem with my implementation is trying to find the words to fill in is very inefficient and taking way too long.Personage
@Jim, i have as this is already used, but this was done specificaly for a job i had, it is possible i will post a light-weight version on my open source projects, if you need more help contact me (ps indeed at some cases the algorithm i posted can take too long, but on average it doesnt)Wei
@Jim, take a look at this crosswords site (still in progress) istavrolexo.gr (in greek) with various (dense) crosswords (i.e scandinavik, classic, sudoku) which have been generated by a similar algorithm (a large scandinavik crossword)Wei
I
9

Why not just use a random probabilistic approach to start with. Start with a word, and then repeatedly pick a random word and try to fit it into the current state of the puzzle without breaking the constraints on the size etc.. If you fail, just start all over again.

You will be surprised how often a Monte Carlo approach like this works.

Izanagi answered 20/6, 2009 at 15:7 Comment(1)
yes, this is the approach I chose. You don't have to try and be very clever. Order words from longest to shortest. In a loop, pick a random cell (column and row coordinate) and place the word on the board testing to see if it runs off the end or interferes with another word (before you write the word to the grid check that each cell is either empty or if it has a letter, that letter matches the letter you are trying to write). There is some other logic to check bounds and stuff. I brute force generate a bunch of smaller and smaller grids, then rank the smallest based on intersecting words.Millwork
R
7

Here is some JavaScript code based on nickf's answer and Bryan's Python code. Just posting it in case someone else needs it in js.

function board(cols, rows) { //instantiator object for making gameboards
this.cols = cols;
this.rows = rows;
var activeWordList = []; //keeps array of words actually placed in board
var acrossCount = 0;
var downCount = 0;

var grid = new Array(cols); //create 2 dimensional array for letter grid
for (var i = 0; i < rows; i++) {
    grid[i] = new Array(rows);
}

for (var x = 0; x < cols; x++) {
    for (var y = 0; y < rows; y++) {
        grid[x][y] = {};
        grid[x][y].targetChar = EMPTYCHAR; //target character, hidden
        grid[x][y].indexDisplay = ''; //used to display index number of word start
        grid[x][y].value = '-'; //actual current letter shown on board
    }
}

function suggestCoords(word) { //search for potential cross placement locations
    var c = '';
    coordCount = [];
    coordCount = 0;
    for (i = 0; i < word.length; i++) { //cycle through each character of the word
        for (x = 0; x < GRID_HEIGHT; x++) {
            for (y = 0; y < GRID_WIDTH; y++) {
                c = word[i];
                if (grid[x][y].targetChar == c) { //check for letter match in cell
                    if (x - i + 1> 0 && x - i + word.length-1 < GRID_HEIGHT) { //would fit vertically?
                        coordList[coordCount] = {};
                        coordList[coordCount].x = x - i;
                        coordList[coordCount].y = y;
                        coordList[coordCount].score = 0;
                        coordList[coordCount].vertical = true;
                        coordCount++;
                    }

                    if (y - i + 1 > 0 && y - i + word.length-1 < GRID_WIDTH) { //would fit horizontally?
                        coordList[coordCount] = {};
                        coordList[coordCount].x = x;
                        coordList[coordCount].y = y - i;
                        coordList[coordCount].score = 0;
                        coordList[coordCount].vertical = false;
                        coordCount++;
                    }
                }
            }
        }
    }
}

function checkFitScore(word, x, y, vertical) {
    var fitScore = 1; //default is 1, 2+ has crosses, 0 is invalid due to collision

    if (vertical) { //vertical checking
        for (i = 0; i < word.length; i++) {
            if (i == 0 && x > 0) { //check for empty space preceeding first character of word if not on edge
                if (grid[x - 1][y].targetChar != EMPTYCHAR) { //adjacent letter collision
                    fitScore = 0;
                    break;
                }
            } else if (i == word.length && x < GRID_HEIGHT) { //check for empty space after last character of word if not on edge
                 if (grid[x+i+1][y].targetChar != EMPTYCHAR) { //adjacent letter collision
                    fitScore = 0;
                    break;
                }
            }
            if (x + i < GRID_HEIGHT) {
                if (grid[x + i][y].targetChar == word[i]) { //letter match - aka cross point
                    fitScore += 1;
                } else if (grid[x + i][y].targetChar != EMPTYCHAR) { //letter doesn't match and it isn't empty so there is a collision
                    fitScore = 0;
                    break;
                } else { //verify that there aren't letters on either side of placement if it isn't a crosspoint
                    if (y < GRID_WIDTH - 1) { //check right side if it isn't on the edge
                        if (grid[x + i][y + 1].targetChar != EMPTYCHAR) { //adjacent letter collision
                            fitScore = 0;
                            break;
                        }
                    }
                    if (y > 0) { //check left side if it isn't on the edge
                        if (grid[x + i][y - 1].targetChar != EMPTYCHAR) { //adjacent letter collision
                            fitScore = 0;
                            break;
                        }
                    }
                }
            }

        }

    } else { //horizontal checking
        for (i = 0; i < word.length; i++) {
            if (i == 0 && y > 0) { //check for empty space preceeding first character of word if not on edge
                if (grid[x][y-1].targetChar != EMPTYCHAR) { //adjacent letter collision
                    fitScore = 0;
                    break;
                }
            } else if (i == word.length - 1 && y + i < GRID_WIDTH -1) { //check for empty space after last character of word if not on edge
                if (grid[x][y + i + 1].targetChar != EMPTYCHAR) { //adjacent letter collision
                    fitScore = 0;
                    break;
                }
            }
            if (y + i < GRID_WIDTH) {
                if (grid[x][y + i].targetChar == word[i]) { //letter match - aka cross point
                    fitScore += 1;
                } else if (grid[x][y + i].targetChar != EMPTYCHAR) { //letter doesn't match and it isn't empty so there is a collision
                    fitScore = 0;
                    break;
                } else { //verify that there aren't letters on either side of placement if it isn't a crosspoint
                    if (x < GRID_HEIGHT) { //check top side if it isn't on the edge
                        if (grid[x + 1][y + i].targetChar != EMPTYCHAR) { //adjacent letter collision
                            fitScore = 0;
                            break;
                        }
                    }
                    if (x > 0) { //check bottom side if it isn't on the edge
                        if (grid[x - 1][y + i].targetChar != EMPTYCHAR) { //adjacent letter collision
                            fitScore = 0;
                            break;
                        }
                    }
                }
            }

        }
    }

    return fitScore;
}

function placeWord(word, clue, x, y, vertical) { //places a new active word on the board

    var wordPlaced = false;

    if (vertical) {
        if (word.length + x < GRID_HEIGHT) {
            for (i = 0; i < word.length; i++) {
                grid[x + i][y].targetChar = word[i];
            }
            wordPlaced = true;
        }
    } else {
        if (word.length + y < GRID_WIDTH) {
            for (i = 0; i < word.length; i++) {
                grid[x][y + i].targetChar = word[i];
            }
            wordPlaced = true;
        }
    }

    if (wordPlaced) {
        var currentIndex = activeWordList.length;
        activeWordList[currentIndex] = {};
        activeWordList[currentIndex].word = word;
        activeWordList[currentIndex].clue = clue;
        activeWordList[currentIndex].x = x;
        activeWordList[currentIndex].y = y;
        activeWordList[currentIndex].vertical = vertical;

        if (activeWordList[currentIndex].vertical) {
            downCount++;
            activeWordList[currentIndex].number = downCount;
        } else {
            acrossCount++;
            activeWordList[currentIndex].number = acrossCount;
        }
    }

}

function isActiveWord(word) {
    if (activeWordList.length > 0) {
        for (var w = 0; w < activeWordList.length; w++) {
            if (word == activeWordList[w].word) {
                //console.log(word + ' in activeWordList');
                return true;
            }
        }
    }
    return false;
}

this.displayGrid = function displayGrid() {

    var rowStr = "";
    for (var x = 0; x < cols; x++) {

        for (var y = 0; y < rows; y++) {
            rowStr += "<td>" + grid[x][y].targetChar + "</td>";
        }
        $('#tempTable').append("<tr>" + rowStr + "</tr>");
        rowStr = "";

    }
    console.log('across ' + acrossCount);
    console.log('down ' + downCount);
}

//for each word in the source array we test where it can fit on the board and then test those locations for validity against other already placed words
this.generateBoard = function generateBoard(seed = 0) {

    var bestScoreIndex = 0;
    var top = 0;
    var fitScore = 0;
    var startTime;

    //manually place the longest word horizontally at 0,0, try others if the generated board is too weak
    placeWord(wordArray[seed].word, wordArray[seed].displayWord, wordArray[seed].clue, 0, 0, false);

    //attempt to fill the rest of the board 
    for (var iy = 0; iy < FIT_ATTEMPTS; iy++) { //usually 2 times is enough for max fill potential
        for (var ix = 1; ix < wordArray.length; ix++) {
            if (!isActiveWord(wordArray[ix].word)) { //only add if not already in the active word list
                topScore = 0;
                bestScoreIndex = 0;

                suggestCoords(wordArray[ix].word); //fills coordList and coordCount
                coordList = shuffleArray(coordList); //adds some randomization

                if (coordList[0]) {
                    for (c = 0; c < coordList.length; c++) { //get the best fit score from the list of possible valid coordinates
                        fitScore = checkFitScore(wordArray[ix].word, coordList[c].x, coordList[c].y, coordList[c].vertical);
                        if (fitScore > topScore) {
                            topScore = fitScore;
                            bestScoreIndex = c;
                        }
                    }
                }

                if (topScore > 1) { //only place a word if it has a fitscore of 2 or higher

                    placeWord(wordArray[ix].word, wordArray[ix].clue, coordList[bestScoreIndex].x, coordList[bestScoreIndex].y, coordList[bestScoreIndex].vertical);
                }
            }

        }
    }
    if(activeWordList.length < wordArray.length/2) { //regenerate board if if less than half the words were placed
        seed++;
        generateBoard(seed);
    }
}
}
function seedBoard() {
    gameboard = new board(GRID_WIDTH, GRID_HEIGHT);
    gameboard.generateBoard();
    gameboard.displayGrid();
}
Renettarenew answered 7/3, 2014 at 17:4 Comment(4)
word object schema is missing, please provide wordArrayMuzhik
Literally just an array of words like ['apple', 'orange', 'pear']Renettarenew
Hi, FYI my edit didn't change a lot of code, it just formatted it. I know it looks so messy when viewing it 'inline', but if you want to see the real changes in code, click 'side-by-side-markdown'. Well... I should have written "Formatted code" in the edit description, but meh.Despairing
How does this work? Can you provide a html file incorporating this javascript?Kriemhild
A
5

I'd generate two numbers: Length and Scrabble score. Assume that a low Scrabble score means it's easier to join on (low scores = lots of common letters). Sort the list by length descending and Scrabble score ascending.

Next, just go down the list. If the word doesn't cross with an existing word (check against each word by their length and Scrabble score, respectively), then put it into the queue, and check the next word.

Rinse and repeat, and this should generate a crossword.

Of course, I'm pretty sure that this is O(n!) and it's not guaranteed to complete the crossword for you, but perhaps somebody can improve it.

Aylward answered 3/6, 2009 at 5:26 Comment(0)
O
3

I've been thinking about this problem. My sense is that to create a truly dense crossword, you cannot hope that your limited word list is going to be enough. Therefore, you might want to take a dictionary and place it into a "trie" data structure. This will allow you to easily find words that fill the left over spaces. In a trie, it is fairly efficient to implement a traversal that, say, gives you all words of the form "c?t".

So, my general thinking is: create some sort of relatively brute force approach as some described here to create a low-density cross, and fill in the blanks with dictionary words.

If anyone else has taken this approach, please let me know.

Olsen answered 22/9, 2010 at 17:34 Comment(0)
M
3

I was playing around crosswords generator engine, and I found this the most important :

0.!/usr/bin/python

  1. a. allwords.sort(key=len, reverse=True)

    b. make some item/object like cursor which will walk around matrix for easy orientation unless you want to iterate by random choice later on.

  2. the first, pick up first pair and place them across and down from 0,0 ; store the first one as our current crossword 'leader'.

  3. move cursor by order diagonal or random with greater diagonal probability to next empty cell

  4. iterate over the words like and use free space length to define max word length : temp=[] for w_size in range( len( w_space ), 2, -1 ) : # t for w in [ word for word in allwords if len(word) == w_size ] : # if w not in temp and putTheWord( w, w_space ) : # temp.append( w )

  5. to compare word against free space I used i.e. :

    w_space=['c','.','a','.','.','.'] # whereas dots are blank cells
    
    # CONVERT MULTIPLE '.' INTO '.*' FOR REGEX
    
    pattern = r''.join( [ x.letter for x in w_space ] )
    pattern = pattern.strip('.') +'.*' if pattern[-1] == '.' else pattern
    
    prog = re.compile( pattern, re.U | re.I )
    
    if prog.match( w ) :
        #
        if prog.match( w ).group() == w :
            #
            return True
    
  6. after each successfully used word, change direction. Loop while all cells are filled OR you run out of words OR by limit of iterations then :

# CHANGE ALL WORDS LIST inexOf1stWord = allwords.index( leading_w ) allwords = allwords[:inexOf1stWord+1][:] + allwords[inexOf1stWord+1:][:]

...and iterate again new crossword.

  1. Make the scoring system by easiness of filling, and some estimation calcs. Give score for the current crossword and narrow later choice by append it into list of made crosswords if the score is satisfied by your scoring system.

  2. After first iteration session iterate again from the list of made crosswords to finish the job.

By using more parameters speed can be improved by a huge factor.

Multitude answered 5/12, 2014 at 8:47 Comment(0)
C
3

This one appears as a project in the AI CS50 course from Harvard. The idea is to formulate the crossword generation problem as a constraint satisfaction problem and solve it with backtracking with different heuristics to reduce the search space.

To start with we need couple of input files:

  1. The structure of the crossword puzzle (which looks like the following one, e.g., where the '#' represents the characters not to be filled with and '_' represents the characters to be filled with)

`

###_####_#
____####_#
_##_#_____
_##_#_##_#
______####
#_###_####
#_##______
#_###_##_#
_____###_#
#_######_#
##_______#    

`

  1. An input vocabulary (word list / dictionary) from which the candidate words will be chosen (like the one shown follows).

    a abandon ability able abortion about above abroad absence absolute absolutely ...

Now the CSP is defined and to be solved as follows:

  1. Variables are defined to have values (i.e., their domains) from the list of words (vocabulary) provided as input.
  2. Each variable gets represented by a 3 tuple: (grid_coordinate, direction, length) where the coordinate represents the start of the corresponding word, direction can be either of horizontal or vertical and the length is defined as the length of the word the variable will be assigned to.
  3. The constraints are defined by the structure input provided: for example, if a horizontal and a vertical variable has a common character, it will represented as an overlap (arc) constraint.
  4. Now, the node consistency and AC3 arc consistency algorithms can be used to reduce the domains.
  5. Then backtracking to obtain a solution (if one exists) to the CSP with MRV (minimum remaining value), degree etc. heuristics can be used to select the next unassigned variable and heuristics like LCV (least constraining value) can be used for domain-ordering, to make the search algorithm faster.

The following shows the output that was obtained using an implementation of the CSP solving algorithm:

`
███S████D█
MUCH████E█
E██A█AGENT
S██R█N██Y█
SUPPLY████
█N███O████
█I██INSIDE
█Q███E██A█
SUGAR███N█
█E██████C█
██OFFENSE█

`

The following animation shows the backtracking steps:

enter image description here

Here is another one with a Bangla (Bengali) language word-list:

enter image description here

Competitor answered 20/4, 2020 at 7:16 Comment(2)
+1 for a really interesting explanation. However, the context for the problem I was trying to solve here was that there was a small set of words that all must be used, and I was trying to find an optimal layout for the crossword, rather than starting with a layout and find words which fit.Jud
Link for AI CS50 course is dead but a great answer nonethelessMistletoe
D
2

I would get an index of each letter used by each word to know possible crosses. Then I would choose the largest word and use it as base. Select the next large one and cross it. Rinse and repeat. It's probably an NP problem.

Another idea is creating a genetic algorithm where the strength metric is how many words you can put in the grid.

The hard part I find is when to know a certain list cannot possibly be crossed.

Dryclean answered 3/6, 2009 at 5:15 Comment(1)
I was also thinking of a genetic algorithm. The fitness function could be how tightly the words are packed into the grid.Lobby
A
2

jQuery Crossword Puzzle Generator and Game

I have coded up a JavaScript/jQuery solution to this problem :

Sample Demo: http://www.earthfluent.com/crossword-puzzle-demo.html

Source Code: https://github.com/HoldOffHunger/jquery-crossword-puzzle-generator

The intent of the algorithm I used:

  1. Minimize the number of the unusable squares in the grid as much as possible.
  2. Have as many inter-mixed words as possible.
  3. Compute in an extremely fast time.

Demonstration of a generated crossword puzzle.

I will describe the algorithm I used:

  1. Group the words together according to those that share a common letter.

  2. From these groups, build sets of a new data structure ("word blocks"), which is a primary word (that runs through all other words) and then the other words (that run through the primary word).

  3. Start out the crossword puzzle with the very first of these word blocks in the very top-left position of the crossword puzzle.

  4. For the rest of the word blocks, starting from the right-bottom most position of the crossword puzzle, move upward and leftward, until there are no more available slots to fill. If there are more empty columns upwards than leftwards, move upwards, and vice versa.

Allograph answered 17/4, 2018 at 18:54 Comment(3)
@holdoffhunger Do you have a method to show the crossword key? Boxes with filled in letters?Hanaper
@Jon Glazer: Typically, you send the crossword keys to the function itself, but you can log the crossword as a 2d-array of chars, right after var crosswords = generateCrosswordBlockSources(puzzlewords);. Just console log this value. Don't forget, there's a "cheat-mode" in the game, where you can just click "Reveal Answer", to get the value immediately.Allograph
This generates puzzles with gibberish "across" words in places with adjacent "down" boxes, and vice versa. Standard crossword puzzles don't work like this, though it does maximize the density.Currin

© 2022 - 2024 — McMap. All rights reserved.