Aviad Ben Dov - 2004-06-30

This is a copy/paste from the wiki about Genetic Algorithms (AKA http://ga-fwork.sf.net/wiki/wakka.php?wakka=GeneticAlgorithms\)

It has no external links, while the Wiki has! I recommend reading this from the Wiki page. Enjoy!

Okay, a few pre-introduction words: Relax. Close all other applications. Try to be more focused than you usually are. Genetic Algorithms are a simple concept, but some concentration is required to understand that they Work, when applied properly.

Genetic Algorithms could be described as a way to reach a desired solution without traversing through all the solution space or, in fact, by traversing only a very small portion of it. To understand why this is useful, consider the Travelling Salesman Problem. There, in a full-mesh scale, the search space would be sized N! (1 * 2 * 3 .. * N), which, when reaching a problem containing 32 points, could be a disastrous number (My calculater produced the number 2.6313083693369353016721801216e+35). If my computer could do 3.0GHz (3,000,000,000,000 operations per second), it would take it approximatly 2,781,274,701,227,100,564 years to go through ALL possible solutions - you can already see that having a thousand computers like mine wouldn't help by much. So, it's true - Even in a game tree based solution, the computer doesn't go through all the possible solutions, but it does go through many of them. Problems like the Travelling Salesman Problem are called NP-Hard problems - They cannot be solved yet by a polinomial solution.

Let us make a change in subject and start exploring Genetic Algorithms, and what makes them tick. Genetic Algorithms are, when narrowed down to it, a simulation of nature's genetic process, with emphasis on breeding and the survival of the fittest. If we take a step out of computers and software, and look at nature outside (and I don't mean "the concrete jungle"..), we see that nature is one of the world's biggest NP-Complete problem - So many parameters (like changes of climates, many diseases and various dangers everywhere). Yet all the animals, vegetation, and even us humans, seem to fit right in there. Evolution is what they call it, the natural adaptation of species to the changes around them and the survival of those individuals that do it better. To take a step back into the world of computers and software, we could try to think about the solution space of a problem? in that way: Each solution is an individual of the solution's species. The problem's conditions for a good solution are actually the "hard weather", and what is left to be seen is which individuals survive the hard winter.

The GA process (AKA cycle) could be simplified to the following:

   1. Initialize a random pool of Individuals.
   2. Evaluate each Individual (AKA Fitness Operator).
   3. Select from the pool a few individuals (AKA Select Operator).
   4. Choose couples from the selection and breed them together (AKA Crossover Operator).
   5. Randomly change a few of the resulting individuals (AKA Mutate Operator).
   6. Return the results to the pool.
   7. If the pool has converged, or a number of pre-determined cycles has been completed, finish the cycle. If not, return to step #2.

The fitness operator is one of the two most important operators in the GA process. This function maps all the properties of an individual to a floating point number, essensialy giving it a rank and a place amongst the other individuals in the pool. Creating the fitness operator is one of the hardest tasks in the creation of a GA solution, if not the hardest itself. It requires to take all the problem space into account, and consider the individual as a finished solution to that problem, then map the errors and lackings in that solution to a number that could reflect that the solution is both bad, but in some way better than other solutions. Meaning, that a simple "Good or Bad" response from a fitness operator is not enough. It needs to distinguish the Bad individuals from the Worse as well as the Good from the Better. If it won't, selection (explained soon) will not be possible.

The select operator distinguishes between the better individuals and the worse individuals using their fitness, which was evaluated in step 2, thus maintaining the "survival of the fittest" idea - Those solutions that fit best to the problem at hand will survive to the next cycle of the population, and have a better chance of breeding and creating new individuals that will inherit their properties, hopefully making them better than their parents.

The crossover operator is the second most important operator and it is the operator which combines two individuals to create (a) new individual(s), which will, it is hoped, become better than his/their parents. This might and Can work because the select operator chooses the better individuals, and stand individual is evaluated to be better or not according to it's inner data (It's Genetic Material, it's genes), breeding between two good individuals will produce an individual whose properties, or genes, are a combination in some way of the properties of the two good parents.

The finishing condition for the GA process is usually the convergence condition or a countdown condition, or a combination of both. Convergence is achieved when all the individuals in the pool represent the same solution. This kind of phenomenon can appear after a few cycles, when a certain individual is very "strong" and his genes are applied throughout the pool. If the individual is extremely strong, and no better individual is found using crossover or mutation (which is explained soon!), the genes will eventually become a part of every individual in the pool to a point that the pool contains only individuals with that gene-set. When convergence is achieved there is no more need to continue breeding - If you breed two identical individuals, you will get the same individual as a child.

Essensialy, if we'd look at the solution space as a graph where the X axis is the solution properties, and the Y axis is the evaluated fitness for the properties, it will have peaks - Most likely more than one. During the GA process, at step #1, indivduals are created all over the graph, and placed on it after the evaluation is carried out in step #2. During the process itself, the individuals "climb" the peaks - And convergence is when all individuals are on the same peak. The problem is, what happens if the individuals converged on a local maxima and not the global maxima, which is practically the difference between a good solution and the best solution?

One solution for the problem is mutation: Mutation is a minor change to the genes of an individual, in a hope to find an even better solution, or rather, to expand the search space to a point where normal breeding might not reach. Mutation effectively slows down convergence, but might yield better and closer-to-best individuals. If an individual is "pushed" to a differernt peak area, a higher one, it might "pull" other individuals with the crossover process to the new peak, thus climbing a better, higher peak, achieving a better solution.

Another solution is the use of the islands concept. The concept is simple: Create a few seperated pools of individuals, and treat them all seperatly. Once in every few cycles, take the best individual from all the pools and add it to all the other pools. If a pool receives a better individual than its own, it might start migrating to a better peak. If the new individual is lacking, however, it might die with the rest of the less-than-best individuals, or might be merged with the better individuals to create even better ones. Also note, that using the Islands concept allows for multi-threading and distributed computing(!). If you create a thread for each pool, or create each pool on a different computer, you get faster processing to get an even better result than by running the cycle once.

These are the basics of Genetic Algorithms. The idea itself is expanded heavily by describing different represntations of individuals, different ways to create islands, different crossover and mutation operators, introduction of new operators such as the distance and maturity operators, and solution proposals using GAs to common problems like the schedule problem.

Every developer/development team who wants to use a GA to solve a problem will implement the basics discussed above and use some of the extensions that fit into their solution; However, writing an implementation for a GA is both time-consuming and error-prone, requiring testings over and over again. This is what the Genetic Algorithms Framework is here to give: A common API for creating a genetic algorithm solution to an NP-Hard problem, with as little as possible code-writing from the user. A user should only provide the fitness operator, and when neccessary the crossover and any other operators needed for his solution, and not write the entire engine that runs the GA around it. You should check out the genCore module in the HomePage if you want to learn more about this API.