Genetic Algorithms - an Intuitive Introduction

Your browser does not support java or you have turned off java support.   

This Java-applet demonstrates the principle of a genetic algorithm (GA).

First, a fractal landscape is built. You may launch this with the button Landschaft (=landscape; in German the sch is spelled like the English sh). It's the GA's task to find the highest point of the landscape by "evaluating" the least possible number of points, that is, to measure the height of the landscape at as few points as possible. This is done with a population of let's say 100 bugs (individuals) randomly distributed over the landscape (press the button Population). The bug's place (the coordinates) is its genetic code (the bugs are immobile), the height of the landscape at its place is its fitness..

Then the two steps of the GA, recombination and selection, are performed alternatingly: The recombination randomly selects two bugs as parents, preferring the ones with high fitness. Somewhere in the rectangle spanned by the parents, the offspring is born and added to the population. The selection takes a bug with low fitness out of the population, thus getting back to the original count of individuals..

You may run this evolution either in single-step-mode or run-mode: Pressing the button Rekombination/Selektion (it changes its label after every single step) triggers the next step, which is animated. The button Go starts the run. In this mode the new offsprings are highlighted shortly. You may cancel the run-mode by pressing the button Abbrechen (cancel).

When all bugs lay on the same height, the GA is terminated. Its result is the height reached by the population's bugs, compared to the absolute maximum height of the landscape. The count of performed measurings (the overall bugs ever born, be it in the initial population or during the evolution) has to be seen in relation to the count of points in the landscape, that is 120'000.

Tips:

Description of the applet

Landschaft (=landscape) Builds a new landscape. An eventually existing population is discarded. Works only if no landscape is being built at the moment.
Abbrechen (=cancel) Cancels the building of a landscape or the run of evolution, respectively.
Population Creates a new Population with the size indicated in the field left to the button. A reasonable size lays between about 20 and 200 individuals. With larger populations, the calculation gets too time-consuming (it depends also on your hardware, you can go higher and cancel the run to find the reasonable size for your equipment).
Rekombination / Selektion The button changes its label depending on the next step. Pressing the button performs and animates the step.
Go starts the run. With the scrollbar you control the speed of calculation.
Statusline (left part) Holds informations about the landscape: x,y are the coordinates of the mouse-pointer, h is the height of the landscape at this point, max is the absolute highest point of the landscape.
Statusline (middle part) Holds informations about the population: N is the count of individuals = size of the population, max the height of the highest individual, Gen (generations) the count of ever calculated individuals in the current population. To be clear, it summarizes the size of the initial population plus all offsprings calculated since the start.
Statusline (right part) Indicates the state of the applet and shows a hint what to do next.

Please apologize for my being too lazy to translate the applet. It saved me some work and the doubling of the .class-files on the web-server, as the applet was not designed for internationalization.

Discussion

References

I would be delighted about a few lines and am ready to answer to questions: mailto: glauser.breitenegg@spectraweb.ch

See my other puzzles


Genetic Algorithms:

Genetic algorithms (GA) are a mathematical technique, based on nature's evolution theory. It helps to find solutions for problems that cannot be calculated, but have to be searched by trial and error.

To multiplicate two numbers, for example, there is an efficient algorithm, so no body will ever test all numbers between 1 and 10000 to find the result of 13 times 85. On the other hand, to find a path through a (well built) labyrinth there is no other way than to test one branching after the other, and to flag dead ends. In other words, one has to test the candidates one after the other. Of course, there are good and bad techniques: One can start with each candidate at the beginning again, or one can go back only to the last branching and go on with a path not (yet) flagged as dead-end. GAs are a technique made available by the computer and being investigated intensively. They could be better suited for certain problems than existing algorithms.

Unfortunately, most GAs run in a space with more than the three dimensions we are familiar with; often even in a space with only the two coordinates 0 and 1, but a hundredfold of dimensions. Like this, nobody can imagine the function of a GA in such a search-space. However, GAs running in a two-dimensional search-space can be constructed. The applet runs such a GA and illustrates its function. The idea for a two-dimensional GA was found in an article by Jeffery Horn, 1995.

back to the applet

 


Fractal Landscape

The construction of the fractal landscape is based on the theory of Benoît Mandelbrot, see The Fractal Geometry of Nature, W.H. Freeman and Company, New York: We start with a plane, on which we put first big pyramids at randomly selected places, and continue with adding smaller and smaller pyramids in an increasing number. The heights of the overlapping pyramids are added to form the surface of the landscape: If the first pyramid with a height of 100 is put on the plane, a point P at the lower part of its incline is raised to let's say a height of 30. If, later on, a second, smaller pyramid with a height of 15 happens to be put near the point P, it gets an additional height of 7.5 (provided P sits exactly in the middle of the incline) from that second pyramid, raising it on a height of 37.5. The procedure terminates when the pyramids reach a given lower limit in their height. The height of all points define the shape of our landscape. Its distinguishing feature is that it looks the same way from no matter what distance. From far away, the details cannot be seen but only the coarse characteristics, which disappear if you go closer because they are all the same in a smaller window, but, from this closer point of view, more details can be distinguished. Because the details were constructed similarly to the big traits, the landscape is called "self-similar".

The applet visualizes the height of a point by its color: from black over blue and green to white, the higher the brighter.
To produce interesting landscapes, it starts by putting two 128 height pyramids on an area of 400 by 300 points. For every consecutive step, the height of the pyramids is reduced by one half, and its number is multiplicated by four. That makes for the second step 8 pyramids of a height of 64. This is continued until the pyramids have a height of 1 in the last step. After every step, the landscape is repaint and thus gets more and more details.

 Back to the applet

 


Problem

If the landscape looks more or less like a natural, fissured landscape, finding the absolute maximum is not as simple as this, what means, there is no efficient straightforward algorithm. For example, it is not sufficient to start at an arbitrary point and to go up always, because this can lead to an only locally maximum (you find yourself at a summit, but around you there are higher summits. To go there, you have to go down first). To find the absolute maximum for sure, there is no other way than measuring all points and keep the last found maximum (its coordinates and its height) in memory. With many points (with real-life GA's with many dimensions), even supercomputer may need years of calculation to enumerate all points.

Often, it is not necessery to find the absolute maximum, because it is sufficient to find a point higher than a given height. Or there can be (like in our case) many equal high maximums distributed over the landscape and it is sufficient to find one of them. Under these circumstances, a GA may be well suited. Wether this is the case or not depends on the shape of the landscape, however, the exact criterions are being investigated currently and not yet well known.

In our model, all the 120,000 points can be measured without any problem. The result is displayed and can be used to estimate the success of the GA.

 back to the applet

 


Genitor

The used type of GA was first described by Whitley (1988, 1989) and is called "genitor". It is well suited for fractal landscapes: The offsprings are born between parents with a high fitness (laying on high places of the landscape), which means that areas around fit bugs are investigated intensively. First, this is done over the whole landscape, after a while the population and therefore the search is concentrated near the hills the GA already found, and where the search continues on a smaller area with higher intensity. In other words, the interesting areas are first selected, then, only these areas are investigated further. The switch from the first to the second stage is seamless; no one has to define, when is has to occur.

It is exciting when there are two sub-populations at two hills of at about the same height. Which is going to win? (By the way, that's the reason why the construction of the landscape begins with two pyramids instead of only one.)

There are, of course, many other types of GA. Most of them replace all parents by offsprings in every generation. Compared to those algorithms, the genitor is simple to program. There is a whole bunch of possibilities to select the parents for recombination, different kinds of recombination (one-point crossover, two-point crossover, with or without spontaneous mutations) and many possibilities to select the individuals to suppress.

The analogy between our model and evolution is obvious: The coordinates of a bug corresponds to its genetic code. It consists of two genes, which range from 0 to 299 and 0 to 399, respectively. The genetic code of the offspring is a random combination of those of its parents. The whole evolution of our model is thus based on the recombination of a randomly generated initial population. As stated above, a mutation-mechanism that would create completely new distinctive marks could easily be constructed.

 back to the applet

 


Recombination

For recombination, all bugs are ordered in a list by ascending heights. For selection for recombination, the ranking in that list is considered rather than the original fitness. Therefore, the pressure of selection remains even if the population is pulled back on a hill and the height-differences between the individuals decrease.

The offspring is born evenly distributed somewhere in the rectangle spanned by its parents. The probability to be born in an arbitrary square of 1/100th of the whole rectangle is ever 1/100, no matter of the actual location of the square within the rectangle.

 back to the applet

 


Selection

The probability for a given bug to be taken out of the population is in inverse proportion to its fitness. In this step, the fitness itself is considered rather than a ranking.

back to the applet

 


Terminate-condition

The algorithm stops, when all bugs have reached the same height and are perhaps located at the same point. It's certainly not guaranteed that this is the absolute maximum of the landscape. This criterion for aborting satisfies rather a certain ease of implementation than analytical considerations. Actually, if the bugs are not located at the same point but happen to be at the same height, there might be a better solution within the bounds of the population, that could be found, if the algorithm continued. Therefore, the most commonly technique used is to stop the algorithm, when all individuals have the same genetic code (all bugs are at the same point). Without mutation, there's no need to continue with the search at his point, because nothing new can arise. Another possible criterion is to stop after a given number of generations, or when a given limit is reached.

back to the applet

 


Discussion of the analogy between this GA and those in multidimensional binary search-spaces (for GA-insiders)

To explain the function of a GA, in most cases the schema theorem published in 1975 by Holland is used. An analogy between the two-dimensional GA presented here and the schema theorem is not obvious.

However, the more basic observation, that with every GA the distance between the offspring and both of its parents is at most those between the parents, is true for the two-dimensional GA, too. It can be suspected that such a search-algorithm may be suitable for self-similar landscapes, which is experimentally supported by the applet. This confirms to the observation of Terry Jones, that fitness-distance-correlation could be a measure for the difficulty of a landscape for a GA.

The schema-theorem requires the preservation of schemas during the recombination. Therefore, a one- or two-point-crossover should be used. But sometimes, the sequence of the bits in the genotype is rather arbitrary than imposed by the problem, in these case, a uniform crossover would be a more natural choose. On my opinion, the placement of the offspring evenly distributed in the rectangle spanned by its parents corresponds quite good to a uniform crossover, as there is no correlation between the individual genes in neither case. Other formulas for recombination were to have the offspring on the diagonal between the parent's, or in the middle of the diagonal. It would be interesting to investigate the influence of such variations of the recombination on the performance of the GA.

back to the applet

 


References

Whitley, D., and Kauth, J. (1988) GENITOR: a Different Genetic Algorithm. Proceedings of the Rocky Mountain Conference on Artificial Intelligence, Denver, CO. pp 118-130 cited in Whitley D, A Genetic Algorithm Tutorial, Colorado State University

Geoffrey Horn, Genetic Algorithms, Problem Difficulty, and the Modality of Fitness Landscapes, Illigal Report No 95004, 1995

Jones, T., and Forrest, S., Fitness Distance Correlation as a Measure of Problem Difficulty for Genetic Algorithms; Sixth international conference on genetic Algorithms

Many universities hold literature-lists in the internet, a part of them offer the articles in fulltext for download. Here are some examples:

The private Santa-Fe-Institute runs a project named "Evolving Cellular Automata" ( www.santafe.edu/projects/evca ).

With following keywords, I successfully sought references in search-machines like AltaVista: genetic algorithms, evolutionary computation.

 back to the applet

© Pascal Glauser, 23.7.1999


our Homepage: Ursula und Pascal Glauser's Homepage