Genetic algorithm weasel

Authors: Alex Tee Neng Heng, David G. Green
Text by: Greg Paperin

Here we explain the basic operation of a genetic algorithm using the example of the famous typing monkey problem described by Richard Dawkins in his book "The Blind Watchmaker". We describe the typing monkey problem in detail in the context of the simple Weasel demo. Here we will concentrate on GAs.

Overview

In a genetic algorithm (GA) we try to evolve a population of candidate solutions to a certain problem. In this case we are trying to evolve a population of strings to match the line "methinks it is like a weasel". We start with a population of random strings. The algorithm works by producing a series of non-overlapping generations. At each generation, new candidate strings are formed by combining the existing candidate solutions together (i.e. crossing them over) and then slightly altering (mutating) the result to introduce variation. The newly formed strings form the basis for the new generation. During this process, not all candidate strings are used as the basis for the next generation, but only those which match the target string "methinks it is like a weasel" better than others. The strings with the least fitness for the target are getting left behind. This way, each generation gets closer and closer towards matching the target, until we eventually find the required solution. The same general approach can be applied to a variety of problems. For instance, rather that using the degree of matching a certain string in order to evaluate the fitness of the candidate solutions, we could measure how well a candidate solution minimises a function value.

In the rest of this article we will describe the above steps in more detail with reference to the demo applet.

Basic operation

Each candidate solution is represented by a genotype. In this case the genotype is a sequence of characters, i.e. a string. Each candidate solution has a fitness, which is a measure of how well is solves the problem. In our demo, the fittest individual is marked red. In this case, the fitness is the proportion of the characters matching the characters in the target string:

Genotype A:    me thinks it is like a weasel    fitness: 1.0
Genotype B:    exedfdiofg ufgio ufiogxiofgfg    fitness: 0.0

At each generation we use selection to choose which candidate solutions will be used to produce the next generation of strings. For the selected parents we produce offspring using crossover and mutation. This is repeated until the required solution is found.

Selection

Before reproduction is possible we need to choose the parents for each new individual. Normally two parents produce a single offspring, although other methods are possible. The aim of selection is to ensure that there is a pressure to produce fitter candidate solutions in each generation. The following methods are supported in this demonstration:

  • Roulette wheel selection. The probability of each individual to be selected is proportional to its fitness. I.e. the fitter the individual, the more likely it is to reproduce.
  • Steady state selection. In this approach K best individuals are selected for reproduction regardless of their absolute fitness value.
  • Ranking. The population is sorted according to fitness value. Probability of reproduction is then proportional to the rank of the candidate solution, not its absolute fitness value.
  • Elitism. In this method, the best strings are copied into the next generation directly. Then, the rest of the population is filled through the normal selection/crossover/mutation process.

Crossover

Offspring are created by mixing parents’ genes. Three types of crossover are supported by the current demo.

  • Uniform Crossover (also called free recombination). In this method, each gene (in this case – character) of the offspring is chosen randomly either from one parent of from the other. For instance:
    Parent A:    a b c d e f
    Parent B:    q w e r t y
    Offspring:   q b e d t f
  • One-Point Crossover. In this method a crossover location is randomly chosen somewhere in the genotype. For the offspring, all characters prior to that location are taken from one parent and the rest of the characters from the other. This can also ne generalised to an N-point crossover. For instance, if the crossover location is 4, we get:
    Parent A:    a b c d e f
    Parent B:    q w e r t y
    Offspring:   a b c d t y
  • Local Fitness crossover. This is similar to free recombination. For each position in the string, the corresponding character is chosen from parent A if that character matches the target string or from the parent B if that match the target string. If none of the two characters matches the target, a parent is chosen at random. This method is not often used in GAs, as the contribution of each gene towards the fitness of the whole individual is not always clear without knowing the other genes.
    Target string:    a b c d e f
    Parent A:         a b w z e g
    Parent B:         x y c p w f
    Offspring:        a b c z e f

Mutation

After an offspring has been determined using the crossover operation, it undergoes mutation. The mutation is performed on a character-by-character basis. With a small probability (called – mutation rate), each character is substituted by a different random character.

Demo screenshot