What's the worst-case valid sudoku puzzle for simple backtracking brute force algorithm?

The "simple/naive backtracking brute force algorithm", "Straightforward Depth-First Search" for sudoku is commonly known and implemented.

and no different implementation seems to exist. (when i first wrote this question.. i wanted to mean we could completely standardize it, but the wording is bad..)

This guy has described the algorithm well i think: https://mcmap.net/q/1629958/-on-sudoku-solving

Edit: So let me have it more specified with pseudo code...

var field[9][9]
set the givens in 'field'
if brute (first empty grid) = true then
    output solution
else
    output no solution
end if

function brute (cx, cy)
    for n = 1 to 9
        if (n doesn't present in row cy) and (n doesn't present in column cx) and (n doesn't present in block (cx div 3, cy div 3)) then
            let field[cx][cy] = n
            if (cx, cy) this is the last empty grid then
                return true
            elseif brute (next empty grid) = true then
                return true
            end if
            let field[cx][cy] = empty
        end if
    next n
end function

I want to find the puzzle that requires most time. We may call it "hardest" for this particular "standardized" algorithm, but this one is not like those questions asking for "Hardest sudoku".

In fact, a "hard" puzzle under this definition may turn super easy when simply rotated or flipped.

According to the rule "for each grid try number 1 to 9", it tries from 1 on, so we may somehow let it try more by using proper number, by the way there won't be permutation problem.

The sudoku puzzle must be valid, i.e. it should have exactly 1 solution. Some guy got a puzzle requiring 1439 seconds, but it's not valid because of having no solution.

I define the time required (or say time complexity) equivalent to how many times the recursive function is entered. (in my implementation, it's slightly different from the pseudo code above, because of the last entrance, and ensuring unique solution, etc.)

Is there any good way to construct it, or we have to use approximate ones like heuristic algorithms to find inexact solutions?

I've implemented a backtracking with both naive strategy (that I referred to as "simple" above, it's unique) and Peter Norvig's "Least Candidates First" strategy (my implementation is deterministic, but not unique. As Peter has also mentioned, the order of python dict changes the result a lot, in case of a tie on the number of candidates).

https://github.com/farteryhr/labs/blob/master/sudoku.c

The no-solution one:

.....5.8....6.1.43..........1.5........1.6...3.......553.....61........4.........

takes 60 seconds on my laptop to get the no-solution conclusion, entering the recursion function 2549798781 times (called "cycles" later). With my implementation of LCF, 78308087 cycles in 30 seconds to conclude. It's because finding the grid with least candidates needs more operations, a single cycle of LCF strategy uses about 16x more time.

The topmost one on the Hardest list:

4.....8.5.3..........7......2.....6.....8.4......1.......6.3.7.5..2.....1.4......

takes 3.0s, found the solution at cycle 9727397, and 142738236 cycles for ensuring unique solution. (my LCF: 981/7216 in 0.004s)

Many in the "hard" list are still easy for naive, though a larger portion of them needs 10^7 to 10^9 cycles.

On Wikipedia: Sudoku solving algorithms (Original) it's stated that such puzzles against backtracking algorithm can be constructed, by making as many empty grids at the beginning as possible and the permutation of the top row 987654321.

Well the test..

..............3.85..1.2.......5.7.....4...1...9.......5......73..2.1........4...9

takes 1.4s, 69175317 cycles for finding solution, 69207227 cycles ensuring unique solution. Not as good as the hard one provided by Peter, but OK, and it's almost right after finding the solution, the search ends. That's probably how the first row works by being lexicographically large. (my LCF: 29206/46160 in 0.023s)

Yes these are obvious, I'm just asking for better ways...

There are also other ways of measuring the difficulty of Sudoku (through solving)

Sudoku Analyst will get stuck with the multiple-solution puzzle given by Peter (naive 419195/419256, LCF 2529478/2529482, yes, there are some puzzles that make LCF do worse):

.....6....59.....82....8....45........3........6..3.54...325..6..................

This one is easy for both naive backtracking (10008/76703) and LCF backtracking (313/1144), but also gets Sudoku Analyst stuck.

..53.....8......2..7..1.5..4....53...1..7...6..32...8..6.5....9..4....3......97..

Another update:

The most difficult Sudoku puzzles are quickly solved by a straightforward depth-first search algorithm

Ha, finally someone also looking for it, and a super tough one is given! The following valid puzzle:

9..8...........5............2..1...3.1.....6....4...7.7.86.........3.1..4.....2..

In this paper, the algorithm is named SDFS, Straightforward Depth-First Search. The number of cycles stated by the author is 1553023932/1884424814, and with my implementation, it's 1305263522/1584688020. Yes, there will be some difference on precisely where to pop the counter, but the basic behavior matches. On repl.it 's server, it took 97s to find the answer and 119s to finish the search.

You can easily generate the worst case by recording the time taken / no. of operations taken by your code to solve hard sudoku puzzles. You can either use a random generator that generates valid sudoku puzzles (or) you can take hard sudoku puzzles from the internet and run your code against it to measure the time/number of operations. Once you run your code against 10000 such cases the slowest 5 (and the unsolved ones) would be the worst cases for your solution.

Edit after farter's comment

I realized u wanted to improve the efficiency of algorithm. 97 seconds for a su-do-ku puzzle considering CPU power is quite high! You could reduce the time by

Do not use backtracking, fill only the squares for which you know the solution. Write a greedy algo that finds answers for squares which you'll never change.
Parallelising your search (finding solutions for multiple squares at the same time)
Your search should involve multiple greedy approaches.. For. eg. pick the lines/3x3 squares/3x9/9x3 rectangles with most numbers revealed and you are more likely to reveal numbers in that region which will reduce the runtime of overall puzzle. Also find the frequency of each number in a specific region and this will help in finding numbers quickly. For example if two 3s are revealed in a 3x9 square you are more likely to reveal the third one since the third 3 in the region should be in one of the 3 cells.
For hard puzzles, add memoisation.. For eg. there will be scenarios where you know that a certain pair of no.s fits in a pair of cells. But you don't know which one goes to which of those cell. In those cases you know for sure that the other 7 numbers do not come in these cells. Memoize this negation criteria and use it for your advantage to minimize your search time.

Let me know if this helps.

Recommended topics

Hot tags