I've taken on the task of creating a sudoku solver using a genetic algorithm.
Initialization: Store the given values in each chromosome, and then randomly generate values such that each row is a valid permutation of the values 1 through 9.
Fitness: Determined by the number of "out of place" values in each row, column, and square grid, added together.
Fitness Function: Typical roulette wheel selection
Selection: Random, but weighted using the roulette wheel.
Crossover: Randomly choose various rows from two parents, which creates one child. (I've also implemented a crossover that randomly chooses 3 rows at a time from the two parents - in an effort to preserve good mini-grids). The following are two example children, one from each crossover method:
Parent 1 row 1
Parent 2 row 2
Parent 1 row 3
Parent 2 row 4
Parent 1 row 5
Parent 2 row 6
Parent 2 row 7
Parent 1 row 8
Parent 1 row 9
Parent 1 row 1
Parent 1 row 2
Parent 1 row 3
Parent 2 row 4
Parent 2 row 5
Parent 2 row 6
Parent 1 row 7
Parent 1 row 8
Parent 1 row 9
Mutation: Initially I just swapped the values at two random locations, but this actually made the algorithm much worse because it introduced duplications in rows which had been valid permutations. So I altered the mutation (which seems to perform best when the chance of mutation is in the 25% - 50% range) to randomly choose a row, and then randomize the ordering of that row (leaving the given values in their correct locations).
I also tried a mutation where it chose a random row and then chose two random (non-given) positions in the row and swapped them, but this made the performance much worse as well. (Unlike the swapping of the two random locations, I don't understand why this mutation would make the performance so much worse, yet a mutation to randomize the entire row makes the performance better)
My algorithm has yet to solve a puzzle of any difficulty. It will often times get close (in the range of only 6 to 10 conflicts), but it can never get to a solution (it's presumably finding local minima and getting stuck).
In an effort to improve the algorithm, I added the ability to replace a large portion of the population (worst performing chromosomes) with completely randomized boards, but this seems to have a minimal effect.
What are weak points in my approach? How can I improve the performance?
It seems like Sudoku might lend itself to better performance from mutation, as opposed to crossover.