Generating minimal/irreducible Sudokus

Asked 3/1, 2013 at 21:43 Answered 12/1, 2013 at 17:23

A Sudoku puzzle is minimal (also called irreducible) if it has a unique solution, but removing any digit would yield a puzzle with multiple solutions. In other words, every digit is necessary to determine the solution.

I have a basic algorithm to generate minimal Sudokus:

Generate a completed puzzle.
Visit each cell in a random order. For each visited cell:
- Tentatively remove its digit
- Solve the puzzle twice using a recursive backtracking algorithm. One solver tries the digits 1-9 in forward order, the other in reverse order. In a sense, the solvers are traversing a search tree containing all possible configurations, but from opposite ends. This means that the two solutions will match iff the puzzle has a unique solution.
- If the puzzle has a unique solution, remove the digit permanently; otherwise, put it back in.

This method is guaranteed to produce a minimal puzzle, but it's quite slow (100 ms on my computer, several seconds on a smartphone). I would like to reduce the number of solves, but all the obvious ways I can think of are incorrect. For example:

Adding digits instead of removing them. The advantage of this is that since minimal puzzles require at least 17 filled digits, the first 17 digits are guaranteed to not have a unique solution, reducing the amount of solving. Unfortunately, because the cells are visited in a random order, many unnecessary digits will be added before the one important digit that "locks down" a unique solution. For instance, if the first 9 cells added are all in the same column, there's a great deal of redundant information there.
If no other digit can replace the current one, keep it in and do not solve the puzzle. Because checking if a placement is legal is thousands of times faster than solving the puzzle twice, this could be a huge time-saver. However, just because there's no other legal digit now doesn't mean there won't be later, once we remove other digits.
Since we generated the original solution, solve only once for each cell and see if it matches the original. This doesn't work because the original solution could be anywhere within the search tree of possible solutions. For example, if the original solution is near the "left" side of the tree, and we start searching from the left, we will miss solutions on the right side of the tree.

I would also like to optimize the solving algorithm itself. The hard part is determining if a solution is unique. I can make micro-optimizations like creating a bitmask of legal placements for each cell, as described in this wonderful post. However, more advanced algorithms like Dancing Links or simulated annealing are not designed to determine uniqueness, but just to find any solution.

How can I optimize my minimal Sudoku generator?

Norvil answered 3/1, 2013 at 21:43 Comment(9)

Generate a completed puzzle. Please define "complete" – Papillose 3/1, 2013 at 21:46

I think you could improve your original solution by using one backtracking solver, rather than two. Once it finds a solution, don't stop - continue searching until you find another solution. Once you find the second solution, stop. – Lugger 3/1, 2013 at 21:49

wildplasser, completed means fully filled-in. mbeckish, that's a great idea! – Geer 3/1, 2013 at 21:54

Also, I think you only have to have one starting puzzle. After you've created an irreducible puzzle from it, you can reorder rows and columns within their cluster, reorder rows and columns of clusters, and "recolor" the puzzle. I'm not sure if this results in every possible irreducible puzzle, however. – Triaxial 3/1, 2013 at 21:57

Definitely not, since irreducible puzzles can have different numbers of elements. The idea is to generate puzzles on-demand for a smartphone app, though, not to enumerate every possible puzzle. – Geer 3/1, 2013 at 22:5

Checking against the original solution would be a quick way of ruling out a removal. Probably save a lot of time heuristically. If it did match, you would still have to do a second solve (or as @Lugger notes, continue the first backtrack), but if it did not match, then you can immediately continue. – Farrah 3/1, 2013 at 22:17

One easy thing to try would be memoizing all your solver's intermediate states across runs - i.e., every time your solver looks at a puzzle with a particular configuration, it checks a table to see if a solution has already been found. If so, return that immediately; if not, solve the puzzle and save the result in the table. If you do this with every board configuration (every step of every solver) then it seems like you'll quickly get to the point where your solver is almost always hitting the table after a couple steps. – Heliotrope 3/1, 2013 at 22:51

This might work well - but it's not as sure a bet as the other optimizations suggested. I may try it if those aren't good enough on their own. – Geer 3/1, 2013 at 23:1

Dancing Links can certainly be used to check for multiple solutions. See garethrees.org/2007/06/10/zendoku-generation/#section-4.4 for example. – Ocreate 3/1, 2013 at 23:35

Here are the main optimizations I implemented with (highly approximate) percentage increases in speed:

Using bitmasks to keep track of which constraints (row, column, box) are satisfied in each cell. This makes it much faster to look up whether a placement is legal, but slower to make a placement. A complicating factor in generating puzzles with bitmasks, rather than just solving them, is that digits may have to be removed, which means you need to keep track of the three types of constraints as distinct bits. A small further optimization is to save the masks for each digit and each constraint in arrays. 40%
Timing out the generation and restarting if it takes too long. See here. The optimal strategy is to increase the timeout period after each failed generation, to reduce the chance that it goes on indefinitely. 30%, mainly from reducing the worst-case runtimes.
mbeckish and user295691's suggestions (see the comments to the original post). 25%

Norvil answered 12/1, 2013 at 17:23 Comment(0)

I have an idea on the 2nd option your had suggested will be better for that provided you add 3 extra checks for the 1st 17 numbers

find a list of 17 random numbers between 1-9
add each item at random location provided
1. these new number added dont fail the 3 basic criteria of sudoku
  - there is no same number in same row
  - there is no same number in same column
  - there is no same number in same 3x3 matrix
2. if condition 1 fails move to the next column or row and check for the 3 basic criteria again.
3. if there is no next row (ie at 9th column or 9th row) add to the 1st column once the 17 characters are filled, run you solver logic on this and look for your unique solution.

Irate answered 4/1, 2013 at 5:52 Comment(1)

How does this avoid possibly having to add too many digits before getting a unique solution, and then having to take some away to get a minimal puzzle? – Geer 4/1, 2013 at 15:58

Here are the main optimizations I implemented with (highly approximate) percentage increases in speed:

Using bitmasks to keep track of which constraints (row, column, box) are satisfied in each cell. This makes it much faster to look up whether a placement is legal, but slower to make a placement. A complicating factor in generating puzzles with bitmasks, rather than just solving them, is that digits may have to be removed, which means you need to keep track of the three types of constraints as distinct bits. A small further optimization is to save the masks for each digit and each constraint in arrays. 40%
Timing out the generation and restarting if it takes too long. See here. The optimal strategy is to increase the timeout period after each failed generation, to reduce the chance that it goes on indefinitely. 30%, mainly from reducing the worst-case runtimes.
mbeckish and user295691's suggestions (see the comments to the original post). 25%

Norvil answered 12/1, 2013 at 17:23 Comment(0)

Recommended topics

Hot tags