Genetic Algorithm Example: Diophantine Equation
By Samuel Hsiung and James Matthews
Make sure that you have read the genetic algorithms essay before reading this example. You also must have a working knowledge of C++ and object-oriented programming to utilize the classes and code examples provided.
Genetic Algorithm Example
Let us consider a diophantine (only integer solutions) equation: a+2b+3c+4d=30,
where a,b,c,d are positive integers.
Using a genetic algorithm, all that is needed is a little time to reach a solution (a,b,c,d). Of course
you could ask, why not just use a brute force method (plug in every possible value for a,b,c,d given
the constraints 1 =< a,b,c,d =< 30)? The architecture of GA systems allow for a solution to be reached quicker
since "better" solutions have a better chance of surviving and procreating, as opposed to randomly throwing
out solutions and seeing which ones work.
Let's start from the beginning. First we will choose 5 random initial solution sets, with constraints
1 =< a,b,c,d =< 30. (Note that we can choose smaller constraints for b,c,d, but for the sake of simplicity
we shall use 30)
| Chromosome |
(a,b,c,d) |
| 1 |
(1,28,15,3) |
| 2 |
(14,9,2,4) |
| 3 |
(13,5,7,3) |
| 4 |
(23,8,16,19) |
| 5 |
(9,13,5,2) |
Table 1: 1st Generation Chromosomes and Their Contents
To calculate the fitness values, plug each solution set into the expression a+2b+3c+4d.
Then, calculate the absolute value of the difference of each expression with 30, this will be our fitness value.
| Chromosome |
Fitness Value |
| 1 |
|114-30|=84 |
| 2 |
|54-30|=24 |
| 3 |
|56-30|=26 |
| 4 |
|163-30|=133 |
| 5 |
|58-30|=28 |
Table 2: Fitness Values of 1st Generation Chromosomes (Solution sets)
Since values that are lower are closer to the desired answer (30), these values are more desirable.
In this case, higher fitness values are not desirable, while lower ones are. In order to create a system
where chromosomes with more desirable fitness values are more likely to be chosen as parents, we must
first calculate the percentages that each chromosome has of being picked. One solution is to take the sum of the multiplicative
inverses of the fitness values (0.135266), and calculate the percentages from there. (Note: ALL simulations were
created using a random number generator)
| Chromosome |
Likelihood |
| 1 |
(1/84)/0.135266 = 8.80% |
| 2 |
(1/24)/0.135266 = 30.8% |
| 3 |
(1/26)/0.135266 = 28.4% |
| 4 |
(1/133)/0.135266 = 5.56% |
| 5 |
(1/28)/0.135266 = 26.4% |
Table 3: Parent Selection by Percentages
In order to pick our 5 pairs of parents (each of which will have 1 offspring, and thus 5 new solutions sets total),
imagine that we had a 10000 sided die, and on 880 of those sides, chromosome 1 was labeled,
and on 3080 of those sides, chromosome 2 was labeled,
and on 2640 of those sides, chromosome 3 was labeled,
and on 556 of those sides, chromosome 4 was labeled,
and on 2640 of those sides, chromosome 5 was labeled.
To choose our first pair we roll the die twice, and take those chromosomes to be our first two parents.
Continuing in this fashion we have the following parents:
| Father Chromosome |
Mother Chromosome |
| 3 |
1 |
| 5 |
2 |
| 3 |
5 |
| 2 |
5 |
| 5 |
3 |
Table 4: Simulated Selection of Parents
The offspring of each of these parents contains the genetic information of both father and mother.
How this can be determined is very arbitrary. However for this case, we could use something
called a "cross-over". Let us say a mother has the solution set a1,b1,c1,d1,
and a father has the solution set a2,b2,c2,d2, then there can
be six possible cross overs (| = dividing line):
| Father Chromosome |
Mother Chromosome |
Offspring Chromosome |
| a1 | b1,c1,d1 |
a2 | b2,c2,d2 |
a1,b2,c2,d2 or
a2,b1,c1,d1 |
| a1,b1 | c1,d1 |
a2,b2 | c2,d2 |
a1,b1,c2,d2 or
a2,b2,c1,d1 |
| a1,b1,c1 | d1 |
a2,b2,c2 | d2 |
a1,b1,c1,d2 or
a2,b2,c2,d1 |
Table 5: Generalization of Cross-overs Between Parents
There are many other ways in which parents can trade genetic information to create an offspring,
crossing over is just one way. Where the dividing line would be located is completely arbitrary,
and so is whether or not the father or mother will contribute to the left or right of the
dividing line. Now let's apply this to our offspring:
| Father Chromosome |
Mother Chromosome |
Offspring Chromosome |
| (13 | 5,7,3) |
(1 | 28,15,3) |
(13,28,15,3) |
| (9,13 | 5,2) |
(14,9 | 2,4) |
(9,13,2,4) |
| (13,5,7 | 3) |
(9,13,5 | 2) |
(13,5,7,2) |
| (14 | 9,2,4) |
(9 | 13,5,2) |
(14,13,5,2) |
| (13,5 | 7, 3) |
(9,13 | 5, 2) |
(13,5,5,2) |
Table 6: Simulated Crossovers from Parent Chromosomes
Now we can calculate the fitness values for the new generation of offsprings.
| Offspring Chromosome |
Fitness Value |
| (13,28,15,3) |
|126-30|=96 |
| (9,13,2,4) |
|57-30|=27 |
| (13,5,7,2) |
|57-30|=22 |
| (14,13,5,2) |
|63-30|=33 |
| (13,5,5,2) |
|46-30|=16 |
Table 7: Fitness Values of Offspring Chromosomes
The average fitness value for the offspring chromosomes were 38.8, while the average
fitness value for the parent chromosomes were 59.4. Of course, the next generation (the
offspring) are supposed to mutate, that is, for example we can change one of the values
in the ordered quadruple of each chromosome to some random integer between 1 and 30.
Progressing at this rate, one chromosome should eventually reach a fitness level of 0 eventually,
that is when a solution is found. If you tried and simulated this yourself, depending on the mutations
you may actually get a fitness average that is higher, but on the long-run, the fitness levels
will decrease. For systems where the population is larger (say 50, instead of 5), the fitness levels
should more steadily and stabily approach the desired level (0).
C++ Class Code
The C++ class is an all encompassing class that takes 5 values upon construction, the 4 coefficients and a result. So for our above example the class is initialized like this:
CDiophantine dp(1,2,3,4,30);
Then to solve the equation, call the Solve() function, this will then return the index of the allele with the solution set in it. Call GetGene() to get the gene with the correct values for a, b, c and d. Therefore a typical main.cpp using the class should look like:
#include <iostream.h>
#include "diophantine.h"
void main() {
CDiophantine dp(1,2,3,4,30);
int ans;
ans = dp.Solve();
if (ans == -1) {
cout << "No solution found." << endl;
} else {
gene gn = dp.GetGene(ans);
cout << "The solution set to a+2b+3c+4d=30 is:\n";
cout << "a = " << gn.alleles[0] << "." << endl;
cout << "b = " << gn.alleles[1] << "." << endl;
cout << "c = " << gn.alleles[2] << "." << endl;
cout << "d = " << gn.alleles[3] << "." << endl;
}
}
For a complete breakdown of the class, see here, or to just download the code (complete with Visual C++ 5.0 project file) here.
Article content copyright © Samuel Hsiung and James Matthews, 1999.
|
|