What's the worst-case valid sudoku puzzle for simple backtracking brute force algorithm?
Asked Answered
F

1

7

The "simple/naive backtracking brute force algorithm", "Straightforward Depth-First Search" for sudoku is commonly known and implemented.

and no different implementation seems to exist. (when i first wrote this question.. i wanted to mean we could completely standardize it, but the wording is bad..)

This guy has described the algorithm well i think: https://mcmap.net/q/1629958/-on-sudoku-solving

Edit: So let me have it more specified with pseudo code...

var field[9][9]
set the givens in 'field'
if brute (first empty grid) = true then
    output solution
else
    output no solution
end if

function brute (cx, cy)
    for n = 1 to 9
        if (n doesn't present in row cy) and (n doesn't present in column cx) and (n doesn't present in block (cx div 3, cy div 3)) then
            let field[cx][cy] = n
            if (cx, cy) this is the last empty grid then
                return true
            elseif brute (next empty grid) = true then
                return true
            end if
            let field[cx][cy] = empty
        end if
    next n
end function

I want to find the puzzle that requires most time. We may call it "hardest" for this particular "standardized" algorithm, but this one is not like those questions asking for "Hardest sudoku".

In fact, a "hard" puzzle under this definition may turn super easy when simply rotated or flipped.

According to the rule "for each grid try number 1 to 9", it tries from 1 on, so we may somehow let it try more by using proper number, by the way there won't be permutation problem.

The sudoku puzzle must be valid, i.e. it should have exactly 1 solution. Some guy got a puzzle requiring 1439 seconds, but it's not valid because of having no solution.

I define the time required (or say time complexity) equivalent to how many times the recursive function is entered. (in my implementation, it's slightly different from the pseudo code above, because of the last entrance, and ensuring unique solution, etc.)

Is there any good way to construct it, or we have to use approximate ones like heuristic algorithms to find inexact solutions?


I've implemented a backtracking with both naive strategy (that I referred to as "simple" above, it's unique) and Peter Norvig's "Least Candidates First" strategy (my implementation is deterministic, but not unique. As Peter has also mentioned, the order of python dict changes the result a lot, in case of a tie on the number of candidates).

https://github.com/farteryhr/labs/blob/master/sudoku.c

The no-solution one:

.....5.8....6.1.43..........1.5........1.6...3.......553.....61........4.........

takes 60 seconds on my laptop to get the no-solution conclusion, entering the recursion function 2549798781 times (called "cycles" later). With my implementation of LCF, 78308087 cycles in 30 seconds to conclude. It's because finding the grid with least candidates needs more operations, a single cycle of LCF strategy uses about 16x more time.

The topmost one on the Hardest list:

4.....8.5.3..........7......2.....6.....8.4......1.......6.3.7.5..2.....1.4......

takes 3.0s, found the solution at cycle 9727397, and 142738236 cycles for ensuring unique solution. (my LCF: 981/7216 in 0.004s)

Many in the "hard" list are still easy for naive, though a larger portion of them needs 10^7 to 10^9 cycles.

On Wikipedia: Sudoku solving algorithms (Original) it's stated that such puzzles against backtracking algorithm can be constructed, by making as many empty grids at the beginning as possible and the permutation of the top row 987654321.

Well the test..

..............3.85..1.2.......5.7.....4...1...9.......5......73..2.1........4...9

takes 1.4s, 69175317 cycles for finding solution, 69207227 cycles ensuring unique solution. Not as good as the hard one provided by Peter, but OK, and it's almost right after finding the solution, the search ends. That's probably how the first row works by being lexicographically large. (my LCF: 29206/46160 in 0.023s)

Yes these are obvious, I'm just asking for better ways...


There are also other ways of measuring the difficulty of Sudoku (through solving)


Sudoku Analyst will get stuck with the multiple-solution puzzle given by Peter (naive 419195/419256, LCF 2529478/2529482, yes, there are some puzzles that make LCF do worse):

.....6....59.....82....8....45........3........6..3.54...325..6..................

This one is easy for both naive backtracking (10008/76703) and LCF backtracking (313/1144), but also gets Sudoku Analyst stuck.

..53.....8......2..7..1.5..4....53...1..7...6..32...8..6.5....9..4....3......97..


Another update:

The most difficult Sudoku puzzles are quickly solved by a straightforward depth-first search algorithm

Ha, finally someone also looking for it, and a super tough one is given! The following valid puzzle:

9..8...........5............2..1...3.1.....6....4...7.7.86.........3.1..4.....2..

In this paper, the algorithm is named SDFS, Straightforward Depth-First Search. The number of cycles stated by the author is 1553023932/1884424814, and with my implementation, it's 1305263522/1584688020. Yes, there will be some difference on precisely where to pop the counter, but the basic behavior matches. On repl.it 's server, it took 97s to find the answer and 119s to finish the search.

Fearfully answered 10/7, 2014 at 16:41 Comment(7)
Some guy, say, Peter Norvig.Zechariah
"no different implementation seems to exist"--that's not true. As the answer you linked to says, "It's one of the simplest (and slowest) ways to solve a Sudoku."Jori
i just meant to use the brute force algorithm he described.....Fearfully
That 'some guy' is head of Google research! But his essay is fantastic: do study his code until you understand every bit of it, I learned a lot from it. But note that his code is far from brute-force, it uses constraint propagation, which reduces the problem space by many orders of magnitude, and only starts doing guesses when he runs out of constraints, which solves most problems in a fraction of a second. A naive, brute-force solution might take something like (9!)^9 steps, the sun probably burns out before that is finished!Hegyera
@BasSwinckels i know a bit about manual sudoku solving by playing around "sudoku explorer" and "hodoku", which analyses the logic between numbers and gives each step, from basic rules to advanced chain methods like forcing net, and there are still some ones proven to have only one solution, but the logic behind some of the steps are still unclear, which the solver still have to use brute force. thanks and i'm aware of those, but i'm just trying to think about this specified problem. i've already defined the brute force algo in the question, and i hope it's understandable.Fearfully
@BasSwinckels btw i've tried most of those "hard" puzzles given by the super guy, on hodoku, and hodoku gives the solution (with path) or "no solution" or "multiple solution" all in just seconds on my slow laptop.Fearfully
The question is updated with my implementation and some test results.Fearfully
M
0

You can easily generate the worst case by recording the time taken / no. of operations taken by your code to solve hard sudoku puzzles. You can either use a random generator that generates valid sudoku puzzles (or) you can take hard sudoku puzzles from the internet and run your code against it to measure the time/number of operations. Once you run your code against 10000 such cases the slowest 5 (and the unsolved ones) would be the worst cases for your solution.

Edit after farter's comment

I realized u wanted to improve the efficiency of algorithm. 97 seconds for a su-do-ku puzzle considering CPU power is quite high! You could reduce the time by

  1. Do not use backtracking, fill only the squares for which you know the solution. Write a greedy algo that finds answers for squares which you'll never change.
  2. Parallelising your search (finding solutions for multiple squares at the same time)
  3. Your search should involve multiple greedy approaches.. For. eg. pick the lines/3x3 squares/3x9/9x3 rectangles with most numbers revealed and you are more likely to reveal numbers in that region which will reduce the runtime of overall puzzle. Also find the frequency of each number in a specific region and this will help in finding numbers quickly. For example if two 3s are revealed in a 3x9 square you are more likely to reveal the third one since the third 3 in the region should be in one of the 3 cells.
  4. For hard puzzles, add memoisation.. For eg. there will be scenarios where you know that a certain pair of no.s fits in a pair of cells. But you don't know which one goes to which of those cell. In those cases you know for sure that the other 7 numbers do not come in these cells. Memoize this negation criteria and use it for your advantage to minimize your search time.

Let me know if this helps.

Magdamagdaia answered 16/2, 2023 at 10:18 Comment(5)
this is basically what we (all the guys mentioned by links in the question, and myself) have done. probably millions of randomly generated puzzle have been tested by each of us, consuming hours to months. yes it finds relatively hard ones, but i'm asking for (ways to approach) THE hardest one (specific to this algorithm), as efficiently (less time to find harder puzzle) as possible. your answer is right but trivial.Fearfully
i've read through your edit and it's appreciated. however the purpose of this question is not to accelerate the solving, but to find the "evil" puzzle for this particular (basic) algorithm, in other words, to specifically "hack" this common baseline implementation with a hard-to-find bad case, making it eat CPU and demonstrate it's not always fast. the case might (in fact almost surely) be just a piece of cake for other algorithms such as more complicated logic-based algos, SAT solvers, dancing links, etc.). the purpose is to "find the case" more efficiently, "find the most evil case".Fearfully
in this case your input is a source code right.. I don't think we have algorithms to read source code to find the worst case of the source code except trying out random test cases.. We need either a human to read the source code to identify the flaws in the code to generate a case manually or a complex AI which can understand the code and then identify the loopholes in the code to build a complex test case. I don't think there is an algorithm that understands any given algorithm except for humans/AI.Magdamagdaia
no, i don't mean to be that general... what i mean is to just stick to this specific algorithm. you may consider this problem "has no input data", just find puzzles that are harder, and post them here (for hacking the specific algorithm harder, for fun), or even terminate this problem once some specific puzzle could be proven THE hardest (for this specific algorithm).Fearfully
all the valid puzzles that start with least revealed numbers are the hardest ones.. You can download any sudoku game and choose the hardest level and get the puzzles from there.. 17/81 squares would be revealed in the extremely hard levels.. you can start any number of new games and each time it'll generate a different puzzle.. You can also write your own algorithm to generate valid puzzles that have exactly 17 revealed squares.. You can also try online generators like opensky.ca/~jdhildeb/software/sudokugen or sudokutodo.com/generatorMagdamagdaia

© 2022 - 2024 — McMap. All rights reserved.