When and why is crossover beneficial in differential evolution?

Asked 19/1, 2014 at 18:43 Answered 7/2, 2014 at 0:32

r optimization evolutionary-algorithm differential-evolution

I implemented a differential evolution algorithm for a side project I was doing. Because the crossover step seemed to involve a lot of parameter choices (e.g. crossover probabilities), I decided to skip it and just use mutation. The method seemed to work ok, but I am unsure whether I would get better performance if I introduced crossover.

Main Question: What is the motivation behind introducing crossover to differential evolution? Can you provide a toy example where introducing crossover out-performs pure mutation?

My intuition is that crossover will produce something like the following in 2-dimensions. Say we have two parent vectors (red). Uniform crossover could produce a new trial vector at one of the blue points.

I am not sure why this kind of exploration would be expected to be beneficial. In fact, it seems like this could make performance worse if high-fitness solutions follow some linear trend. In the figure below, lets say the red points are the current population, and the optimal solution is towards the lower right corner. The population is traveling down a valley such that the upper right and lower left corners produce bad solutions. The upper left corner produces "okay" but suboptimal solutions. Notice how uniform crossover produces trials (in blue) that are orthogonal to the direction of improvement. I've used a cross-over probability of 1 and neglected mutation to illustrate my point (see code). I imagine this situation could arise quite frequently in optimization problems, but could be misunderstanding something.

Note: In the above example, I am implicitly assuming that the population was randomly initialized (uniformly) across this space, and has begun to converge to the correct solution down the central valley (top left to bottom right).

This toy example is convex, and thus differential evolution wouldn't even be the appropriate technique. However, if this motif was embedded in a multi-modal fitness landscape, it seems like crossover might be detrimental. While crossover does support exploration, which could be beneficial, I am not sure why one would choose to explore in this particular direction.

R code for the example above:

N = 50

x1 <- rnorm(N,mean=2,sd=0.5)
x2 <- -x1+4+rnorm(N,mean=0,sd=0.1)
plot(x1,x2,pch=21,col='red',bg='red',ylim=c(0,4),xlim=c(0,4))

x1_cx = list(rep(0, 50))
x2_cx = list(rep(0, 50))
for (i in 0:N) {
  x1_cx[i] <- x1[i]
  x2_cx[i] <- x2[sample(1:N,1)]
}

points(x1_cx,x2_cx,pch=4,col='blue',lwd=4)

Follow-up Question: If crossover is beneficial in certain situations, is there a sensible approach to a) determining if your specific problem would benefit from crossover, and b) how to tune the crossover parameters to optimize the algorithm?

A related stackoverflow question (I am looking for something more specific, with a toy example for instance): what is the importance of crossing over in Differential Evolution Algorithm?

A similar question, but not specific to differential evolution: Efficiency of crossover in genetic algorithms

Gilli answered 19/1, 2014 at 18:43 Comment(3)

If they follow a linear trend, there might be an easier solution than an evolutionary algorithm. What if there was a much deeper side-valley in the upper right / lower left corner? – Perreault 22/1, 2014 at 17:2

@Perreault -- Thanks for commenting. Is your argument that, without crossover, the algorithm is too greedy and doesn't explore enough? Why is the mutation step not enough? Aren't there enough parameters there to play around with? I'm trying to keep my routine with as few free parameters as possible. – Gilli 23/1, 2014 at 13:14

This may not be directly applicable to your situation, but as a general intuition for crossover, the Hill-Robertson effect (en.wikipedia.org/wiki/Hill%E2%80%93Robertson_effect) makes a lot of sense – Hypocotyl 3/8, 2015 at 6:28

I am not particularly familiar with the specifics of the DE algorithm but in general the point of crossover is that if you have two very different individuals with high fitness it will produce an offspring that is intermediate between them without being particularly similar to either. Mutation only explores the local neighbourhood of each individual without taking the rest of the population into account. If you think of genomes as points in some high dimensional vector space, then a mutation is shift in a random direction. Therefore mutation needs to take small steps since if your are starting from a significantly better than random position, a long step in a random direction is almost certain to make things worse because it is essentially just introducing entropy into an evolved genome. You can think of a cross over as a step from one parent towards the other. Since the other parent is also better than random, it is more promising to take a longer step in that direction. This allows for faster exploration of the promising parts of the fitness landscape.

In real biological organisms the genome is often organized in such a way that genes that depend on each other are close together on the same chromosome. This means that crossover is unlikely to break synergetic gene combinations. Real evolution actually moves genes around to achieve this (though this is much slower than the evolution of individual genes) and sometimes the higher order structure of the genome (the 3 dimensional shape of the DNA) evolves to prevent cross-overs in particularly sensitive areas. These mechanisms are rarely modeled in evolutionary algorithms, but you will get more out of crossovers if you order your genome in a way that puts genes that are likely to interact close to each other.

Desmonddesmoulins answered 26/1, 2014 at 6:31 Comment(5)

When studying at least one work on genetic algorithms, I recall the author studying improvements in successive generations in the context of favorable traits, which were substrings of the total gene. So proofs were framed like "favorable substrings spread exponentially" etc. - hence crossover. This should also point out where GA may shine vs. other algorithms - the fact that substrings are preferred means GA is more suitable to things like programs or other complex structures rather than just simple vectors. Of course, doing crossovers and ending up with valid solutions is still quite tricky. – Hulky 28/1, 2014 at 3:42

This is a useful answer for classical genetic algorithms. But I don't think it fully applies to DE, since the mutation step explicitly combines/mixes parameter sets between successful individuals. – Gilli 29/1, 2014 at 3:36

The DE mutation update rule is X += k*(Y-Z). So you step in the Y-Z direction from X. If X, Y & Z are chosen randomly and independently then Y-Z is essentially random when considered as a direction starting from X. It can easily move X away from both Y and Z. In a very high dimensional space it is very likely to move X in a direction that is very close to orthogonal to the direction to either Y or Z. So I think the mutation steps still need to be small to avoid reversion to the mean and the answer applies to DE as well, but as I said I am not an expert on DE – Desmonddesmoulins 29/1, 2014 at 4:57

Hmm what if we pick Y and Z such that Y has higher fitness than Z? Would you expect this to improve the algorithm? Thanks for your responses thus far. – Gilli 29/1, 2014 at 17:6

It may help if the problem is very simple. In a degenerate case where the fitness gradient is constant, yoursuggestion is guaranteed to help, but in such a case there in no maximum you do not need DE. In general Y-Z can still point away from both Y & Z so you could still end up moving downhill even for a simple convex problem. Consider a 1D convex problem with Z & Y both on one side of the optimal point and X on the other. Your suggestion would pick a vector that points to the solution on the Y,Z side, but that points away on the X side. So I think that mutaions still need to take small steps. – Desmonddesmoulins 29/1, 2014 at 17:45

As Daniel says, cross over is a way to take larger steps across the problem landscape, allowing you to escape local maxima that a single mutation would be unable to do so.

Whether it is appropriate or not will depend on the complexity of the problem space, how the genotype -> phenotype expression works (will related genes be close together), etc.

More formally this is the concept of 'Connectivity' in Local Search algorithms, providing strong enough operators that the local search neighbourhood is sufficentally large to escape local minima.

Lapith answered 29/1, 2014 at 13:20 Comment(0)

No. Crossover is not useful. There I said it. :P

I've never found a need for crossover. People seem to think it does some kind of magic. But it doesn't (and can't) do anything more useful than simple mutation. Large mutations can be used to explore the entire problem space and small mutations can be used to exploit niches.

And all the explanations I've read are (to put it mildly) unsatisfactory. Crossover only complicates your algorithms. Drop it asap. Your life will be simpler. .... IMHO.

Accomplished answered 7/2, 2014 at 0:32 Comment(7)

I imagine that's why evolution never bothers with it in the real world ... :D – Lapith 25/2, 2014 at 13:58

I knew someone would ask :\. Apparently the reason natural evolution uses crossover(sex) is to ward off parasites (see the Red Queen Hypothesis). Crossover has little to do with search efficiency. Mutation is simpler and does just as well or better. – Accomplished 24/3, 2014 at 8:23

This answer is outdated and plain wrong! Needs to be downvoted. Kenneth Stanley proved in 2002 that crossover can be effective for certain evolutionary algorithms. Read up on the NEAT algorithm (citeseerx.ist.psu.edu/viewdoc/…). It's for neuroevolution, but it demonstrates that crossover isn't a silly concept. If your crossover algorithm is intelligent it can make your algorithm much more effective (in the case of NEAT it was about 1/3 more efficient on a pole balancing problem when crossover was used). – Hypocotyl 2/8, 2015 at 13:19

Also read up on this: en.wikipedia.org/wiki/Hill%E2%80%93Robertson_effect - crossover has a lot to do with search efficiency. – Hypocotyl 3/8, 2015 at 6:26

@JoeRocc Give it a try. Take out the crossover operator on your EA. Measure the performance. You may be surprised. I've tried many published theories on a variety of problems. IMHO, they don't live up to the hype, and are very problem-specific. – Accomplished 4/8, 2015 at 20:25

@JoeRocc You should be able to measure the Robertson effect too! That would be VERY cool. – Accomplished 4/8, 2015 at 20:33

I have used DE a lot for difficult non-toy problems and have reached the same conclusion: do not use cross-over if you want reliable convergence to the global minimum of the cost function. Instead, increase the population size to ensure that its dimensionality stays as large as the size of your parameter vector. – Saito 6/6, 2019 at 5:29

Recommended topics

Hot tags