I have some experience with training a fixed-topology NN using a genetic algorithm (What the paper refers to as the "traditional NE approach"). There are several different mutation and reproduction operators we used for this and we selected those randomly.
Given two parents, our reproduction operators (could also call these crossover operators) included:
Swap either single weights or all weights for a given neuron in the network. So for example, given two parents selected for reproduction either choose a particular weight in the network and swap the value (for our swaps we produced two offspring and then chose the one with the best fitness to survive in the next generation of the population), or choose a particular neuron in the network and swap all the weights for that neuron to produce two offspring.
swap an entire layer's weights. So given parents A and B, choose a particular layer (the same layer in both) and swap all the weights between them to produce two offsping. This is a large move so we set it up so that this operation would be selected less often than the others. Also, this may not make sense if your network only has a few layers.
Our mutation operators operated on a single network and would select a random weight and either:
- completely replace it with a new random value
- change the weight by some percentage. (multiply the weight by some random number between 0 and 2 - practically speaking we would tend to constrain that a bit and multiply it by a random number between 0.5 and 1.5. This has the effect of scaling the weight so that it doesn't change as radically. You could also do this kind of operation by scaling all the weights of a particular neuron.
- add or subtract a random number between 0 and 1 to/from the weight.
- Change the sign of a weight.
- swap weights on a single neuron.
You can certainly get creative with mutation operators, you may discover something that works better for your particular problem.
IIRC, we would choose two parents from the population based on random proportional selection, then ran mutation operations on each of them and then ran these mutated parents through the reproduction operation and ran the two offspring through the fitness function to select the fittest one to go into the next generation population.
Of course, in your case since you're also evolving the topology some of these reproduction operations above won't make much sense because two selected parents could have completely different topologies. In NEAT (as I understand it) you can have connections between non-contiguous layers of the network, so for example you can have a layer 1 neuron feed another in layer 4, instead of feeding directly to layer 2. That makes swapping operations involving all the weights of a neuron more difficult - you could try to choose two neurons in the network that have the same number of weights, or just stick to swapping single weights in the network.
I know that while training a NE, usually the backpropagation algorithm is used to correct the weights
Actually, in NE backprop isn't used. It's the mutations performed by the GA that are training the network as an alternative to backprop. In our case backprop was problematic due to some "unorthodox" additions to the network which I won't go into. However, if backprop had been possible, I would have gone with that. The genetic approach to training NNs definitely seems to proceed much more slowly than backprop probably would have. Also, when using an evolutionary method for adjusting weights of the network, you start needing to tweak various parameters of the GA like crossover and mutation rates.