How to perform rank based selection in a genetic algorithm?
Asked Answered
D

4

11

I am implementing a small genetic algorithm framework - primarily for private use, unless I manage to make something reasonable at which time I will post it as open source. Right now I am focusing on selection techniques. So far I have implemented roulette wheel selection, stochastic universal sampling and tournament selection. Next on my list is rank based selection. I had a little more difficulty finding information about that than the other techniques that I've already implemented, but here is my understanding so far.

  1. When you have your population from which you want to get reasonable parents for the next round, you first go through it and divide the fitness of each individual by the total fitness in the population.

  2. Then you use some other selection technique (such as roulette wheel) to actually determine whom to select for breeding.

Is this correct? If so, am I right in thinking that the rank adjustment is a sort of preprocessing step which then has to be followed by an actual selection procedure that picks out the candidates? Please correct me if I have misunderstood any of this. I am grateful for any additional pointers.

Demonography answered 29/11, 2013 at 17:30 Comment(0)
O
11

What you've described there is roulette wheel selection, not rank selection. To do rank selection, rather than weighting each candidate by its fitness score, you weight it by its "rank" (that is, best, second best, third best, etc.).

For instance, you might give the first one a weighting of 1/2, the second a weighting of 1/3, the third a weighting of 1/4, etc. Or the worst gets weighting 1, the second worst gets weighting 2, etc.

The important point is that absolute or relative fitness scores are not considered, only the rankings. So the best is more likely to be chosen than the second best, but the two have the same probabilities of being chosen whether the best had ten times the score of the second best, or only had a slightly greater score.

Ontologism answered 29/11, 2013 at 17:39 Comment(11)
So if I understand you correctly, I would get the ranking by simply sorting the chromosomes based on fitness? And then I can update the fitness for each to represent its rank rather than absolute fitness? If this is true, how then do I go about selecting people from that list as parents? Do I simply use one of the other selection algorithms?Demonography
You use "weighted random selection". Roulette wheel selection is also "weighted random selection", just with weightings derived from fitness rather than rank.Ontologism
I have written the following, which seems to produce correct results (a linear sequence between 0 and 1). void linear_rank_fitness_adjustment(chromosome* population) { std::sort(population, population+population_size, chromosome_compare); for(int i=0;i<population_size;i++) population[i].fitness=(float)(i+1)/population_size; } Is it reasonable to apply this filter/preprocessing step and then let the user choose any of the other selection algorithms to pick out the actual parents?Demonography
I am sorry about the insane formatting. I'm not sure how to fix it.Demonography
I'm not sure why you're thinking in terms of using multiple selection algorithms. It doesn't really make sense. The output of a selection algorithm is "these genes, and not those other genes"... it might be a complicated and hybridized selection algorithm, but after it's done, all the relevant decisions have been made.Ontologism
The part that confuses me is this. If I do rank based selection, I will be ranking the entire population. From that, I want to pick out a subset to serve as parents just like I do with the other algorithms I have implemented thus far. But I'm not sure how to select that subset after the ranking is done. With rank selection I just remove the relative fitness weights, but I don't see how this selects a subset of the population like for example roulette wheel or stochastic universal sampling does. Is the little code snippet above correct, by the way?Demonography
It's correct, although note that the resultant weights will not be normalized. What it's missing is the actual weighted random selection, of course.Ontologism
Couldn't I just use roulette to do the weighted randomization? Wouldn't this work since I update the fitness to actually be the rank instead of the original absolute fitness? Also, can you please elaborate a little on the term normalized in this context? How do I normalize the list after sorting it and setting the ranks? Currently, after running the function I have a list of values between 0.0 and 1.0. Should I be doing something more?Demonography
"Roulette wheel selection" specifically refers to the use of a fitness metric as a probability distribution for random selection. If you're not using the fitness metric as the weighting, it's not roulette wheel selection. The rank is NOT the fitness metric. It's used to generate a probability distribution, but that does not make it a fitness metric. As for normalized, that refers to all weights summing to 1, which is necessary for certain weighted random selection implementations.Ontologism
So even if it is not actually roulette selection as it's not refering to the fitness, am I right in thinking that I could use the same principle as in roulette to get the final number of parents? I am going to accept your answer, and I want to say that I appreciate you taking the extra time to answer my questions. I am a complete beginner in this field, so I am going to continue investigating on my own based on the information you've given me. Thanks once again!Demonography
https://mcmap.net/q/1017780/-rank-selection-in-ga . Implementation can be seen hereSeedbed
S
21

What have you described is simply Roulette wheel selection.In Roulette wheel selection:

  • Parents are selected according to their fitness
  • The better the chromosomes are, the more chances to be selected they have.

Imagine a roulette wheel where all chromosomes in the population are placed, each chromosome has its place big accordingly to its fitness function, like on the following pictureenter image description here.
This selection will have problems when the finesses differs very much. Outstanding individuals will introduce a bias in the beginning of the search that may cause a premature convergence and a loss of diversity.
For Example:

if an initial population contains one or two very fit but not the best individuals and the rest of the population are not good, then these fit individuals will quickly dominate the whole population and prevent the population from exploring other potentially better individuals. Such a strong domination causes a very high loss of genetic diversity which is definitely not advantageous for the optimization process.

But in Rank Selection:

  1. Rank selection first ranks the population and then every chromosome receives fitness from this ranking.
  2. The worst will have fitness 1, second worst 2 etc. and the best will have fitness N (number of chromosomes in population). enter image description here

  3. After this all the chromosomes have a chance to be selected.

  4. Rank-based selection schemes can avoid premature convergence.
  5. But can be computationally expensive because it sorts the populations based on fitness value.
  6. But this method can lead to slower convergence, because the best chromosomes do not differ so much from other ones.

So the process will be:

  1. First sort the Fitness value of the Population.

  2. Then if the Population number is 10 then give the probability of selection to the Population like 0.1,0.2,0.3,...,1.0 .

  3. Then calculate cumulative Fitness and make roulette wheel.
  4. And the next steps is same as roulette wheel.

My Implementation of Rank Selection in Matlab:

NewFitness=sort(Fitness);
        NewPop=round(rand(PopLength,IndLength));

        for i=1:PopLength
            for j=1:PopLength
                if(NewFitness(i)==Fitness(j))
                    NewPop(i,1:IndLength)=CurrentPop(j,1:IndLength);
                    break;
                end
            end
        end
        CurrentPop=NewPop;

        ProbSelection=zeros(PopLength,1);
        CumProb=zeros(PopLength,1);

        for i=1:PopLength
            ProbSelection(i)=i/PopLength;
            if i==1
                CumProb(i)=ProbSelection(i);
            else
                CumProb(i)=CumProb(i-1)+ProbSelection(i);
            end
        end

        SelectInd=rand(PopLength,1);

        for i=1:PopLength
            flag=0;
            for j=1:PopLength
                if(CumProb(j)<SelectInd(i) && CumProb(j+1)>=SelectInd(i))
                    SelectedPop(i,1:IndLength)=CurrentPop(j+1,1:IndLength);
                    flag=1;
                    break;
                end
            end
            if(flag==0)
                SelectedPop(i,1:IndLength)=CurrentPop(1,1:IndLength);
            end
        end

Note: you can also see my question regarding rank selection here in this link and my article here.

Seedbed answered 26/1, 2016 at 13:26 Comment(0)
O
11

What you've described there is roulette wheel selection, not rank selection. To do rank selection, rather than weighting each candidate by its fitness score, you weight it by its "rank" (that is, best, second best, third best, etc.).

For instance, you might give the first one a weighting of 1/2, the second a weighting of 1/3, the third a weighting of 1/4, etc. Or the worst gets weighting 1, the second worst gets weighting 2, etc.

The important point is that absolute or relative fitness scores are not considered, only the rankings. So the best is more likely to be chosen than the second best, but the two have the same probabilities of being chosen whether the best had ten times the score of the second best, or only had a slightly greater score.

Ontologism answered 29/11, 2013 at 17:39 Comment(11)
So if I understand you correctly, I would get the ranking by simply sorting the chromosomes based on fitness? And then I can update the fitness for each to represent its rank rather than absolute fitness? If this is true, how then do I go about selecting people from that list as parents? Do I simply use one of the other selection algorithms?Demonography
You use "weighted random selection". Roulette wheel selection is also "weighted random selection", just with weightings derived from fitness rather than rank.Ontologism
I have written the following, which seems to produce correct results (a linear sequence between 0 and 1). void linear_rank_fitness_adjustment(chromosome* population) { std::sort(population, population+population_size, chromosome_compare); for(int i=0;i<population_size;i++) population[i].fitness=(float)(i+1)/population_size; } Is it reasonable to apply this filter/preprocessing step and then let the user choose any of the other selection algorithms to pick out the actual parents?Demonography
I am sorry about the insane formatting. I'm not sure how to fix it.Demonography
I'm not sure why you're thinking in terms of using multiple selection algorithms. It doesn't really make sense. The output of a selection algorithm is "these genes, and not those other genes"... it might be a complicated and hybridized selection algorithm, but after it's done, all the relevant decisions have been made.Ontologism
The part that confuses me is this. If I do rank based selection, I will be ranking the entire population. From that, I want to pick out a subset to serve as parents just like I do with the other algorithms I have implemented thus far. But I'm not sure how to select that subset after the ranking is done. With rank selection I just remove the relative fitness weights, but I don't see how this selects a subset of the population like for example roulette wheel or stochastic universal sampling does. Is the little code snippet above correct, by the way?Demonography
It's correct, although note that the resultant weights will not be normalized. What it's missing is the actual weighted random selection, of course.Ontologism
Couldn't I just use roulette to do the weighted randomization? Wouldn't this work since I update the fitness to actually be the rank instead of the original absolute fitness? Also, can you please elaborate a little on the term normalized in this context? How do I normalize the list after sorting it and setting the ranks? Currently, after running the function I have a list of values between 0.0 and 1.0. Should I be doing something more?Demonography
"Roulette wheel selection" specifically refers to the use of a fitness metric as a probability distribution for random selection. If you're not using the fitness metric as the weighting, it's not roulette wheel selection. The rank is NOT the fitness metric. It's used to generate a probability distribution, but that does not make it a fitness metric. As for normalized, that refers to all weights summing to 1, which is necessary for certain weighted random selection implementations.Ontologism
So even if it is not actually roulette selection as it's not refering to the fitness, am I right in thinking that I could use the same principle as in roulette to get the final number of parents? I am going to accept your answer, and I want to say that I appreciate you taking the extra time to answer my questions. I am a complete beginner in this field, so I am going to continue investigating on my own based on the information you've given me. Thanks once again!Demonography
https://mcmap.net/q/1017780/-rank-selection-in-ga . Implementation can be seen hereSeedbed
M
2

I was also a bit confused by various sources on how the probabilities are being computed when using the Linear Ranking Selection, sometimes also called "Rank Selection" as is referred to here. At least I hope these two are referring to the same thing.

The part which was elusive to me is the sum of the ranks which seems to have been omitted or at least not explicitly stated in most of the sources. Here I present a short, yet a verbose Python example of how the probability distribution is computed (those nice charts you often see).

Assuming these are some example individual fintesses: 10, 9, 3, 15, 85, 7.

Once sorted, assign the ranks in ascending order: 1st: 3, 2nd: 7, 3rd: 9, 4th: 10, 5th: 15, 6th: 85

The sum of all ranks is 1+2+3+4+5+6 or using the gauss formula (6+1)*6/2 = 21.

Hence, we compute the probabilities as: 1/21, 2/21, 3/21, 4/21, 5/21, 6/21, which you can then express as percentages:

Probability distribution of selecting an individual using Rank Selection

Take note that this is not what is used in actual implementations of genetic algorithms, only a helper script to give you better intuition.

You can fetch this script with:

curl -o ranksel.py https://gist.githubusercontent.com/kburnik/3fe766b65f7f7427d3423d233d02cd39/raw/5c2e569189eca48212c34b3ea8a8328cb8d07ea5/ranksel.py

#!/usr/bin/env python

"""
Assumed name of script: ranksel.py

Sample program to estimate individual's selection probability using the Linear
Ranking Selection algorithm - a selection method in the field of Genetic
Algorithms. This should work with Python 2.7 and 3.5+.

Usage:

./ranksel.py f1 f2 ... fN

Where fK is the scalar fitness of the Kth individual. Any ordering is accepted.

Example:

$ python -u ranksel.py 10 9 3 15 85 7
Rank Fitness Sel.prob.
   1    3.00     4.76%
   2    7.00     9.52%
   3    9.00    14.29%
   4   10.00    19.05%
   5   15.00    23.81%
   6   85.00    28.57%

"""

from __future__ import print_function
import sys

def compute_sel_prob(population_fitness):
  """Computes and generates tuples of (rank, individual_fitness,
     selection_probability) for each individual's fitness, using the Linear
     Ranking Selection algorithm."""
  # Get the number of individuals in the population.
  n = len(population_fitness)

  # Use the gauss formula to get the sum of all ranks (sum of integers 1 to N).
  rank_sum = n * (n + 1) / 2

  # Sort and go through all individual fitnesses; enumerate ranks from 1.
  for rank, ind_fitness in enumerate(sorted(population_fitness), 1):
    yield rank, ind_fitness, float(rank) / rank_sum


if __name__ == "__main__":
  # Read the fitnesses from the command line arguments.
  population_fitness = list(map(float, sys.argv[1:]))

  print ("Rank Fitness Sel.prob.")
  # Iterate through the computed tuples and print the table rows.
  for rank, ind_fitness, sel_prob in compute_sel_prob(population_fitness):
    print("%4d %7.2f %8.2f%%" % (rank, ind_fitness, sel_prob * 100))
Martella answered 30/1, 2018 at 0:54 Comment(0)
S
0

According to the book J. Palma the correct way is:

  1. order a list per ranking. The first position is for the chromosome with higher fitness.
  2. Choose an 1<=Amax<=1.2, normally we use Amax = 1.2
  3. Amin = 2-Amax, so if we choose Amax = 1.2 then Amin = 0.8
  4. Pi = (Amax - (Amax-Amin)·(rank-1)/(m-1))·1/m

m = Number total of the elements of the list rank = the position in the list

For example, using Amax=1.2, Amin=0.8 and m=3 since we have a population of 3 chromosomes.

Example

After that, you can apply the same system as the proportional selection.

Statolith answered 1/12, 2020 at 6:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.