Genetic algorithm /w Neural Network playing snake is not improving

Asked 12/5, 2018 at 19:22 Answered 4/5, 2020 at 20:49

python python-3.x neural-network genetic-algorithm unsupervised-learning

I am attempting to create a genetic algorithm to train a neural network, with the goal of playing the game snake.

The problem I am having is that the fitness of the generations isn't improving, it either sits still at the fitness one could expect from not giving any input to the game, or only gets worse after the first generation. I suspect it to be an issue with the Neural Network, however I am at a loss as to what it is.

Neural Network setup

24 Input Nodes

2 Hidden layers

8 Nodes per layer

4 Output Nodes (One for each direction the snake can take)

The input is an array of every direction the snake can see. For each direction it checks how far the distance is to either a wall, a fruit, or itself. The end result is an array with a length of 3*8 = 24.

The weights and biases are random floats between -1 and 1, generated when the network is created.

Genetic Algorithm setup

Population size: 50000

Parents chosen per generation: 1000

Keep top per generation: 25000 (New variable, seeing better results)

Mutation chance per child: 5%

(I have tried many different size ratios, although I'm still not sure what a typical ratio is.)

I am using single point crossover. Every array of weights and biases is crossed over between the parents, and passed on to the children (one child for each "version" of the crossover).

I am using what I think is roulette selection to select parents, I will post the exact method below.

The fitness of a snake is calculated with: age * 2**score (not anymore, more info in update), where age is how many turns the snake survived, and score is the amount of fruits it collected.

Details

Here is some pseudo-code to try to summarise how my genetic algorithm (should) work:

pop = Population(size=1000)

while True:  # Have yet to implement a 'converged' check
    pop.calc_fitness()

    new_pop = []

    for i in range(n_parents):

        parent1 = pop.fitness_based_selection()
        parent2 = pop.fitness_based_selection()

        child_snake1, child_snake2 = parent1.crossover(parent2)

        if rand() <= mutate_chance:
            child_snake.mutate()

        new_pop.append(child_snake1, child_snake2)

    pop.population = new_pop

    print(generation_statistics)
    gen += 1

Here is the method I use to select a parent:

def fitness_based_selection(self):
    """
    A slection process that chooses a snake, where a snake with a higher fitness has a higher chance of being
    selected
    :return: The chosen snake's brain
    """
    sum_fitnesses = sum(list([snake[1] for snake in self.population]))

    # A random cutoff digit.
    r = randint(0, sum_fitnesses)

    current_sum = 0

    for snake in self.population:
        current_sum += snake[1]
        if current_sum > r:
            # Return brain of chosen snake
            return snake[0]

It is worth noting that self.population is a list of snakes, where each snake is a list containing the NeuralNet controlling it, and the fitness the network achieved.

And here is the method for getting an output from a network from a game output, as I suspect there may be something I'm doing wrong here:

def get_output(self, input_array: np.ndarray):
    """
    Get output from input by feed forwarding it through the network

    :param input_array: The input to get an output from, should be an array of the inputs
    :return: an output array with 4 values of the shape 1x4
    """

    # Add biases then multiply by weights, input => h_layer_1, this is done opposite because the input can be zero
    h_layer_1_b = input_array  + self.biases_input_hidden1
    h_layer_1_w = np.dot(h_layer_1_b, self.weights_input_hidden1)
    h_layer_1 = self.sigmoid(h_layer_1_w)  # Run the output through a sigmoid function

    # Multiply by weights then add biases, h_layer_1 => h_layer_2
    h_layer_2_w = np.dot(h_layer_1, self.weights_hidden1_hidden2)
    h_layer_2_b = h_layer_2_w + self.biases_hidden1_hidden2
    h_layer_2 = self.sigmoid(h_layer_2_b)

    # Multiply by weights then add biases, h_layer_2 => output
    output_w = np.dot(h_layer_2, self.weights_hidden2_output)
    output_b = output_w + self.biases_hidden2_output

    output = self.sigmoid(output_b)
    return output

When running the neural network manually, with a graphical version of the game enabled, it is clear the network almost never changes direction more than once. This confuses me, as I was under the impression that if all the weights and biases are generated randomly, the input would be processed randomly and give a random output, instead the output seems to change once on the first turn of the game, then never significantly change again.

When running the genetic algorithm, the highest fitness of each generation barely ever exceeds the fitness one would expect from a snake without input (in this case 16), which I suppose is correlated to the issue with the neural network. When it does exceed, the next generations will revert back to 16 again.

Any help with his issue would be appreciated immensely, I'm still new to this area, and I'm finding it really interesting. I'll gladly answer any more details if need be. My complete code can be found here, if anybody dares delve into that.

Update:

I changed a couple of things:

Fixed the weight/bias generation, previously they were only generating between 0 and 1.
Edited my crossover method to return two children per set of parents instead of just one.
Changed the fitness function to only be equal to the snake's age (For testing purposes)
Changed the population variables

Now the algorithm performs better, the first generation usually find a snake with a fitness of 14-16, meaning that the snake does make turns to avoid death, however it almost always only goes downhill from there. The first snake will actually have achieved a tactic of turning when near the east and north/south edges, but never the west edge. After the first generation the fitness tends to only get worse, eventually going back down to the lowest possible fitness. I'm at a loss at what is going wrong, but I have a feeling it might be something large that I've overlooked.

Update #2:

Figured I might as well mention some things I tried that didn't work:

Changed the nodes per hidden layer from 8 to 16, this made the snakes perform significantly worse.
Allowed the snake to turn back into itself, this also made the snakes perform worse.
Large (I think they're large, not sure what a standard pop size is.) population sizes of ~1 000 000, with ~1000 parents, no positive change.
16 or 32 nodes per hidden layer, had seemingly little to no impact.
Fixed the mutate function to properly assign values between -1 and 1, no noticeable impact.

Update #3:

I changed a couple of things and started seeing better results. Firstly I stopped fruits from spawning to simplify the learning process, and instead gave the snakes a fitness that was equal to their age (how many turns/frames they survived), and after turning off normalization of the input array I got a snake with a fitness of 300! 300 is the maximum age a snake can have before dying of old age.

However the problem still exists in that after the first couple of generations the fitness will plummet, the first 1-5 generations may have a fitness of 300 (sometimes they don't, and have a low fitness instead, but I assume this is down to population size.), but after that the fitness of the generations will plummet down to ~20-30 and stay there.

Also if I turn fruits back on, the snakes get abysmal fitnesses again.Sometimes the first generation will achieve a snake capable of moving in loops and therefore getting a fitness of 300 without picking up any fruits, but this is almost never transferred to the next generation.

Garratt answered 12/5, 2018 at 19:22 Comment(6)

Couple things: normalizing the vision array seems weird. In other NN applications, you would normalize the entire data set. When you normalize a single sample like you're doing, several close objects and several far objects will "look" similar, which seems wrong. If that doesn't change anything, create a few specific situations and verify that the vision function is working. Then potentially try a single layer neural network and step through 1 by 1 in a debugger and see if values are as you expect at each step. Sorry I don't have a real answer. – Ghana 17/5, 2018 at 17:34

@Ghana Thanks for commenting! I made sure the vision function was working, stopped normalising the input array, and stopped fruits from spawning, and after raising the population size to 50000, I got a snake with a fitness of 300! (The max fitness a snake can have before dying of old age) Woo! The problem still persists though, as the fitness will massively drop after the first or second generation, then stay low. It feels like I'm missing something in my GA? I don't understand how the top fitness can go from 300 to 20 then stay there. – Garratt 17/5, 2018 at 19:4

Unfortunately I don't have much experience with genetic algorithms, but maybe your changes/crossover are too drastic? Perhaps you could bias the changes towards the more fit parent so that good + bad snakes mostly follow the good parent. I haven't had a chance to test your code but it looks like the crossover function might be mutating the arrays that are passed to it, so when you try to make a child you are also modifying both the parents and I don't think you want that. – Ghana 17/5, 2018 at 19:42

Your fitness based selection can never select the last snake in the population. I don't think this should be notabl enough to cause a problem if the population is big enough, but it is a small bias. – Overspread 18/5, 2018 at 11:41

You may consider trying different kinds of recombination methods, like uniform crossover or (especially) intermediate crossover (see e.g. Essentials of Metaheuristics). I'd expect them to be more suitable for ANN recombination than just one-point crossover. Also be very careful with your fitness function — GA is apparently very sensitive to this. I was stuck with a bad fitness function in my Santa Fe Trail solver for a week. – Seaborne 21/5, 2018 at 12:53

And selection. It sounds like you're using kind of truncation selection. Consider trying tournament selection (start with tournament size two; a reasonable top is somewhere near seven) — it allows more diversity (and also control over diversity) during selection (which is usually good) and is even simpler than tournament selection in implementation. – Seaborne 21/5, 2018 at 13:3

I noticed that in your pseudocode, upon creating each new generation, the parent generation is completely wiped out and only the child generation is retained. This can naturally lead to lowering fitness levels, since there is nothing guaranteeing that the offspring will have fitness levels comparable to those in the parent generation. In order to ensure that the fitness levels are non-decreasing, you must either merge the parent and child generations and prune the weakest members (which I would recommend), or you can require that the offspring-generating function produces offspring at least as fit as the parents (by many trials and errors).

If you decide to focus on the offspring-generator, one way to (somewhat) guarantee improved offspring is to implement asexual reproduction by way of simply adding a small amount of noise to each weight vector. If the noise level is small enough, you can generate improved offspring with a success rate of up to 50%. Bigger noise levels, however, allow faster improvement and they help to jump out of local optima, even if they have success rates lower than 50%.

Clower answered 27/5, 2018 at 21:53 Comment(4)

I've been away for a couple days so I haven't had time to implement some of the suggestions, but I shall try this tomorrow. Seeing how the bounty is about to expire I might as well reward it, but I will wait until I have seen the results to accept any answer. Thanks. – Garratt 28/5, 2018 at 21:52

I look forward to hearing your results. :) – Clower 29/5, 2018 at 22:15

So I've updated my keep_per_gen variable, which keeps the top N snakes per generation to something much larger. Currently I'm keeping 25000 per generation of 50000, and now the fitness doesn't start going down after the first generation, woo. However even after running the algorithm for about 200 generations (Should I be running it for longer?), the fitness never dramatically increases above 300, indicating that the snakes learn to loop, but never learn to go after fruits. – Garratt 29/5, 2018 at 22:57

You can increase fruit-seeking behavior by putting a very large number of fruits on the map. Once you start seeing fruit-seeking behavior, then you could gradually decrease the number of fruits. This is because if there are only a small number of fruits, then the algorithm is more concerned with other factors affecting fitness, and eventually it becomes too hard for a mutation to add fruit-seeking behaviors later on. – Clower 29/5, 2018 at 23:42

You are only mutating 5% of the population, not 5% of the "genome". This means your population will be fixed incredibly quickly - https://en.wikipedia.org/wiki/Fixation_(population_genetics) .

This makes sense why the population isn't doing very well, because you are only exploring a very small area of the fitness landscape ( https://en.wikipedia.org/wiki/Fitness_landscape ) .

You should change the mutate function to mutate 5% of the genome (ie weights between nodes). Feel free to play around with the mutation rate as well - different problems perform better at different mutational rates.

If you are worried about loosing the current "best genome", a typical approach in evolutionary computation is to copy the individual with the highest fitness over to the next generation without mutation.

(Sorry, this probably should have been a comment but I don't have enough reputation).

Valadez answered 28/5, 2018 at 13:42 Comment(0)

I had this exact problem for a week and I only just figured out what the issue is. When assigning brains from one snake to another, make sure to use array.copy. E.g do this:

a = np.array([1, 2, 3, 4])
b = a.copy()

and not this:

a = np.array([1, 2, 3, 4])
b = a

This is because python will sometimes make b and a share the same memory in the second case, and so whenever you reassign b, a will also be reassigned.

Grosswardein answered 4/5, 2020 at 20:49 Comment(0)

Neural Network setup

Genetic Algorithm setup

Details

Recommended topics

Hot tags