I am attempting to create a genetic algorithm to train a neural network, with the goal of playing the game snake.
The problem I am having is that the fitness of the generations isn't improving, it either sits still at the fitness one could expect from not giving any input to the game, or only gets worse after the first generation. I suspect it to be an issue with the Neural Network, however I am at a loss as to what it is.
Neural Network setup
24 Input Nodes
2 Hidden layers
8 Nodes per layer
4 Output Nodes (One for each direction the snake can take)
The input is an array of every direction the snake can see. For each direction it checks how far the distance is to either a wall, a fruit, or itself. The end result is an array with a length of 3*8 = 24
.
The weights and biases are random floats between -1 and 1, generated when the network is created.
Genetic Algorithm setup
Population size: 50000
Parents chosen per generation: 1000
Keep top per generation: 25000 (New variable, seeing better results)
Mutation chance per child: 5%
(I have tried many different size ratios, although I'm still not sure what a typical ratio is.)
I am using single point crossover. Every array of weights and biases is crossed over between the parents, and passed on to the children (one child for each "version" of the crossover).
I am using what I think is roulette selection to select parents, I will post the exact method below.
The fitness of a snake is calculated with: age * 2**score
(not anymore, more info in update), where age is how many turns the snake survived, and score is the amount of fruits it collected.
Details
Here is some pseudo-code to try to summarise how my genetic algorithm (should) work:
pop = Population(size=1000)
while True: # Have yet to implement a 'converged' check
pop.calc_fitness()
new_pop = []
for i in range(n_parents):
parent1 = pop.fitness_based_selection()
parent2 = pop.fitness_based_selection()
child_snake1, child_snake2 = parent1.crossover(parent2)
if rand() <= mutate_chance:
child_snake.mutate()
new_pop.append(child_snake1, child_snake2)
pop.population = new_pop
print(generation_statistics)
gen += 1
Here is the method I use to select a parent:
def fitness_based_selection(self):
"""
A slection process that chooses a snake, where a snake with a higher fitness has a higher chance of being
selected
:return: The chosen snake's brain
"""
sum_fitnesses = sum(list([snake[1] for snake in self.population]))
# A random cutoff digit.
r = randint(0, sum_fitnesses)
current_sum = 0
for snake in self.population:
current_sum += snake[1]
if current_sum > r:
# Return brain of chosen snake
return snake[0]
It is worth noting that self.population
is a list of snakes, where each snake is a list containing the NeuralNet controlling it, and the fitness the network achieved.
And here is the method for getting an output from a network from a game output, as I suspect there may be something I'm doing wrong here:
def get_output(self, input_array: np.ndarray):
"""
Get output from input by feed forwarding it through the network
:param input_array: The input to get an output from, should be an array of the inputs
:return: an output array with 4 values of the shape 1x4
"""
# Add biases then multiply by weights, input => h_layer_1, this is done opposite because the input can be zero
h_layer_1_b = input_array + self.biases_input_hidden1
h_layer_1_w = np.dot(h_layer_1_b, self.weights_input_hidden1)
h_layer_1 = self.sigmoid(h_layer_1_w) # Run the output through a sigmoid function
# Multiply by weights then add biases, h_layer_1 => h_layer_2
h_layer_2_w = np.dot(h_layer_1, self.weights_hidden1_hidden2)
h_layer_2_b = h_layer_2_w + self.biases_hidden1_hidden2
h_layer_2 = self.sigmoid(h_layer_2_b)
# Multiply by weights then add biases, h_layer_2 => output
output_w = np.dot(h_layer_2, self.weights_hidden2_output)
output_b = output_w + self.biases_hidden2_output
output = self.sigmoid(output_b)
return output
When running the neural network manually, with a graphical version of the game enabled, it is clear the network almost never changes direction more than once. This confuses me, as I was under the impression that if all the weights and biases are generated randomly, the input would be processed randomly and give a random output, instead the output seems to change once on the first turn of the game, then never significantly change again.
When running the genetic algorithm, the highest fitness of each generation barely ever exceeds the fitness one would expect from a snake without input (in this case 16), which I suppose is correlated to the issue with the neural network. When it does exceed, the next generations will revert back to 16 again.
Any help with his issue would be appreciated immensely, I'm still new to this area, and I'm finding it really interesting. I'll gladly answer any more details if need be. My complete code can be found here, if anybody dares delve into that.
Update:
I changed a couple of things:
- Fixed the weight/bias generation, previously they were only generating between 0 and 1.
- Edited my crossover method to return two children per set of parents instead of just one.
- Changed the fitness function to only be equal to the snake's age (For testing purposes)
- Changed the population variables
Now the algorithm performs better, the first generation usually find a snake with a fitness of 14-16, meaning that the snake does make turns to avoid death, however it almost always only goes downhill from there. The first snake will actually have achieved a tactic of turning when near the east and north/south edges, but never the west edge. After the first generation the fitness tends to only get worse, eventually going back down to the lowest possible fitness. I'm at a loss at what is going wrong, but I have a feeling it might be something large that I've overlooked.
Update #2:
Figured I might as well mention some things I tried that didn't work:
- Changed the nodes per hidden layer from 8 to 16, this made the snakes perform significantly worse.
- Allowed the snake to turn back into itself, this also made the snakes perform worse.
- Large (I think they're large, not sure what a standard pop size is.) population sizes of ~1 000 000, with ~1000 parents, no positive change.
- 16 or 32 nodes per hidden layer, had seemingly little to no impact.
- Fixed the mutate function to properly assign values between -1 and 1, no noticeable impact.
Update #3:
I changed a couple of things and started seeing better results. Firstly I stopped fruits from spawning to simplify the learning process, and instead gave the snakes a fitness that was equal to their age (how many turns/frames they survived), and after turning off normalization of the input array I got a snake with a fitness of 300! 300 is the maximum age a snake can have before dying of old age.
However the problem still exists in that after the first couple of generations the fitness will plummet, the first 1-5 generations may have a fitness of 300 (sometimes they don't, and have a low fitness instead, but I assume this is down to population size.), but after that the fitness of the generations will plummet down to ~20-30 and stay there.
Also if I turn fruits back on, the snakes get abysmal fitnesses again.Sometimes the first generation will achieve a snake capable of moving in loops and therefore getting a fitness of 300 without picking up any fruits, but this is almost never transferred to the next generation.