What are the differences between simulated annealing and genetic algorithms?

Asked 4/11, 2010 at 0:1 Answered 4/11, 2010 at 20:42

Solved artificial-intelligence genetic-algorithm simulated-annealing constraint-satisfaction

What are the relevant differences, in terms of performance and use cases, between simulated annealing (with bean search) and genetic algorithms?

I know that SA can be thought as GA where the population size is only one, but I don't know the key difference between the two.

Also, I am trying to think of a situation where SA will outperform GA or GA will outperform SA. Just one simple example which will help me understand will be enough.

Mike answered 4/11, 2010 at 0:1 Comment(1)

"SA can be thought as GA where the population size is only one"- Not exactly. Though simulated annealing maintains only 1 solution from one trial to the next, its acceptance of worse-performing candidates is much more integral to its function that the same thing would be in a genetic algorithm. In fact, some GAs only ever accept improving candidates. Also, realize that one "generation" for SA typically is much faster in real time than the same for a GA, so more generations may be tried in the same time- an important practical consideration. – Jaguar 1/2, 2011 at 15:14

Well strictly speaking, these two things--simulated annealing (SA) and genetic algorithms are neither algorithms nor is their purpose 'data mining'.

Both are meta-heuristics--a couple of levels above 'algorithm' on the abstraction scale. In other words, both terms refer to high-level metaphors--one borrowed from metallurgy and the other from evolutionary biology. In the meta-heuristic taxonomy, SA is a single-state method and GA is a population method (in a sub-class along with PSO, ACO, et al, usually referred to as biologically-inspired meta-heuristics).

These two meta-heuristics are used to solve optimization problems, particularly (though not exclusively) in combinatorial optimization (aka constraint-satisfaction programming). Combinatorial optimization refers to optimization by selecting from among a set of discrete items--in other words, there is no continuous function to minimize. The knapsack problem, traveling salesman problem, cutting stock problem--are all combinatorial optimization problems.

The connection to data mining is that the core of many (most?) supervised Machine Learning (ML) algorithms is the solution of an optimization problem--(Multi-Layer Perceptron and Support Vector Machines for instance).

Any solution technique to solve cap problems, regardless of the algorithm, will consist essentially of these steps (which are typically coded as a single block within a recursive loop):

encode the domain-specific details in a cost function (it's the step-wise minimization of the value returned from this function that constitutes a 'solution' to the c/o problem);
evaluate the cost function passing in an initial 'guess' (to begin iteration);
based on the value returned from the cost function, generate a subsequent candidate solution (or more than one, depending on the meta-heuristic) to the cost function;
evaluate each candidate solution by passing it in an argument set, to the cost function;
repeat steps (iii) and (iv) until either some convergence criterion is satisfied or a maximum number of iterations is reached.

Meta-heuristics are directed to step (iii) above; hence, SA and GA differ in how they generate candidate solutions for evaluation by the cost function. In other words, that's the place to look to understand how these two meta-heuristics differ.

Informally, the essence of an algorithm directed to solution of combinatorial optimization is how it handles a candidate solution whose value returned from the cost function is worse than the current best candidate solution (the one that returns the lowest value from the cost function). The simplest way for an optimization algorithm to handle such a candidate solution is to reject it outright--that's what the hill climbing algorithm does. But by doing this, simple hill climbing will always miss a better solution separated from the current solution by a hill. Put another way, a sophisticated optimization algorithm has to include a technique for (temporarily) accepting a candidate solution worse than (i.e., uphill from) the current best solution because an even better solution than the current one might lie along a path through that worse solution.

So how do SA and GA generate candidate solutions?

The essence of SA is usually expressed in terms of the probability that a higher-cost candidate solution will be accepted (the entire expression inside the double parenthesis is an exponent:

p = e((-highCost - lowCost)/temperature)

Or in python:

p = pow(math.e, (-hiCost - loCost) / T)

The 'temperature' term is a variable whose value decays during progress of the optimization--and therefore, the probability that SA will accept a worse solution decreases as iteration number increases.

Put another way, when the algorithm begins iterating, T is very large, which as you can see, causes the algorithm to move to every newly created candidate solution, whether better or worse than the current best solution--i.e., it is doing a random walk in the solution space. As iteration number increases (i.e., as the temperature cools) the algorithm's search of the solution space becomes less permissive, until at T = 0, the behavior is identical to a simple hill-climbing algorithm (i.e., only solutions better than the current best solution are accepted).

Genetic Algorithms are very different. For one thing--and this is a big thing--it generates not a single candidate solution but an entire 'population of them'. It works like this: GA calls the cost function on each member (candidate solution) of the population. It then ranks them, from best to worse, ordered by the value returned from the cost function ('best' has the lowest value). From these ranked values (and their corresponding candidate solutions) the next population is created. New members of the population are created in essentially one of three ways. The first is usually referred to as 'elitism' and in practice usually refers to just taking the highest ranked candidate solutions and passing them straight through--unmodified--to the next generation. The other two ways that new members of the population are usually referred to as 'mutation' and 'crossover'. Mutation usually involves a change in one element in a candidate solution vector from the current population to create a solution vector in the new population, e.g., [4, 5, 1, 0, 2] => [4, 5, 2, 0, 2]. The result of the crossover operation is like what would happen if vectors could have sex--i.e., a new child vector whose elements are comprised of some from each of two parents.

So those are the algorithmic differences between GA and SA. What about the differences in performance?

In practice: (my observations are limited to combinatorial optimization problems) GA nearly always beats SA (returns a lower 'best' return value from the cost function--ie, a value close to the solution space's global minimum), but at a higher computation cost. As far as i am aware, the textbooks and technical publications recite the same conclusion on resolution.

but here's the thing: GA is inherently parallelizable; what's more, it's trivial to do so because the individual "search agents" comprising each population do not need to exchange messages--ie, they work independently of each other. Obviously that means GA computation can be distributed, which means in practice, you can get much better results (closer to the global minimum) and better performance (execution speed).

In what circumstances might SA outperform GA? The general scenario i think would be those optimization problems having a small solution space so that the result from SA and GA are practically the same, yet the execution context (e.g., hundreds of similar problems run in batch mode) favors the faster algorithm (which should always be SA).

Flaunch answered 4/11, 2010 at 20:42 Comment(4)

Great work.What about SA with beam search,are there any differences? – Mike 4/11, 2010 at 23:56

well, SA w/ local beam search versus GA has the same set of differences as ordinary SA versus GA except that SA w/ local beam is like GA in that it considers simultaneously a set of candidate solutions ('states') rather than just one. – Flaunch 5/11, 2010 at 2:40

Why do you say 'recursive function'? I don't see any need for recursion. – Rye 12/11, 2016 at 13:20

" The result of the crossover operation is like what would happen if vectors could have sex" – Hazlitt 1/4, 2017 at 6:44

It is really difficult to compare the two since they were inspired from different domains..

A Genetic Algorithm maintains a population of possible solutions, and at each step, selects pairs of possible solution, combines them (crossover), and applies some random changes (mutation). The algorithm is based the idea of "survival of the fittest" where the selection process is done according to a fitness criteria (usually in optimization problems it is simply the value of the objective function evaluated using the current solution). The crossover is done in hope that two good solutions, when combined, might give even better solution.

On the other hand, Simulated Annealing only tracks one solution in the space of possible solutions, and at each iteration considers whether to move to a neighboring solution or stay in the current one according to some probabilities (which decays over time). This is different from a heuristic search (say greedy search) in that it doesn't suffer from the problems of local optimum since it can get unstuck from cases where all neighboring solutions are worst the current one.

Morass answered 4/11, 2010 at 18:38 Comment(0)

I'm far from an expert on these algorithms, but I'll try and help out.

I think the biggest difference between the two is the idea of crossover in GA and so any example of a learning task that is better suited to GA than SA is going to hinge on what crossover means in that situation and how it is implemented.

The idea of crossover is that you can meaningfully combine two solutions to produce a better one. I think this only makes sense if the solutions to a problem are structured in some way. I could imagine, for example, in multi-class classification taking two (or many) classifiers that are good at classifying a particular class and combining them by voting to make a much better classifier. Another example might be Genetic Programming, where the solution can be expressed as a tree, but I find it hard to come up with a good example where you could combine two programs to create a better one.

I think it's difficult to come up with a compelling case for one over the other because they really are quite similar algorithms, perhaps having been developed from very different starting points.

Tiller answered 4/11, 2010 at 13:14 Comment(4)

Thanks Stomp,is it the case that in SA we will only have mutation,while in GA,we will have both mutation and combination? – Mike 4/11, 2010 at 15:48

Yeah, SA is a lot like GA only without crossover. – Tiller 4/11, 2010 at 16:48

You might be interested in this stack-exchange proposal. It's almost ready to begin beta, just needs a few more. – Perrine 19/1, 2011 at 5:14

You can actually have a functional GA without the crossover element (only mutation). The only difference with SA then is that you have multiple solutions. Also, a GA usually improves ifs efficiency when the mutation chance is gradually lowered when reaching better solutions, just like in SA. Such a GA could be seen as a sort of 'multi'-SA. – Rye 12/11, 2016 at 13:23

Recommended topics

Hot tags