New Operators and Dominance Scheme for a

Diploid GA


Istanbul Technical University, Computer Engineering Dept.
TR-80626 Maslak Istanbul, TURKEY
etaner@cs.itu.edu.tr

1  Introduction

Classical genetic algorithms use a haploid representation for individuals. However, most complex organisms in nature have a diploid chromosome structure, i.e. the organism has two alleles for each characteristic, located on two homologue chromosomes. Even though this seems like redundant information, it's nature's way of keeping a genetic memory and introducing genetic diversity. This way, genetic information which may be useful in the future is not lost but is shielded against harmful selection via a domination mechanism which masks the recessive allele until a future time when it may become useful.

2  The Diploid Algorithm

In the implementation of this study, each individual is represented by two chromosome arrays which make up the genotype, one phenotype array, a fitness value and an age which shows for how many generations the individual has stayed in the population. The algorithm uses the basic concepts in ``Goldberg, (1989)'' along with some new operators and a new domination scheme. The pseudo-code of the diploid algorithm is given below .


The diploid genetic algorithm

begin
initialize;
for no. of generation times do
  select mating pool;
  meiosis;
  form offspring;
  mutation;
  for each dead parent, form new individual;
  select next generation;
  calculate new domination array values;
end.


3.1  Genotype to Phenotype Mapping

The phenotype of the individual is used in calculating its fitness. The mechanism to map the genotype onto the phenotype is called domination. It determines the allele to be expressed in the case where the two alleles for a characteristic are different.

In this study a domination array which is made up of real numbers in [0.0,1.0] and has the same length as the chromosomes is used in the domination mechanism. Each real number shows the dominance factor of allele 1 over allele 0 for the locus corresponding to its index in the array. The domination array is initialized to have the value 0.5 for all locations which means that either allele is equally likely to be expressed. After each generation, the domination array is recalculated using  Equation 1.

where phij is the phenotypic value of the jth individual at the ith chromosomal location, fj is the fitness value of the jth individual, l is the chromosome length and s is the number of individuals.

3.2  The Algorithm

The initialization step of the algorithm includes the initialization of the individuals' chromosomes and the domination array. The main loop starts with the selection of individuals using a roulette wheel selection method and the selected individuals are paired off randomly. Each parent goes through a meiotic cell division phase which involves replication of chromosomes and a possible two-point cross over between homologue pairs. As a result of this, four gametes, two of which are selected randomly to be copied into the genotypes of the two offspring, are formed. Each parent gives one chromosome to each offspring. At the end of reproduction, the size of the population becomes 2n given that it was n before this step. Mutation may occur on the genotype of each of the 2n individuals. Some of the parents may die due to old age and new individuals are initialized randomly to make up for the population size which is constant. The probability of an individual to die is calculated using k.age2 where k is a constant in [0.0,1.0] and age is the individual's age. The n individuals to survive into the next generation are selected from among these 2n individuals using a fitness proportional selection mechanism similar to a roulette wheel selection and the age counters of each selected individual is incremented. The new domination array values are calculated and the population is scanned for the individual with the best fitness.

4  Test Problems

The proposed diploid algorithm and the standard haploid algorithm are compared using several test functions. The results of only two tests will be given below.

In the first test problem the number of 1s in a 32 bit string is to be maximized. The fitness of each individual is calculated by counting the number of 1 s. This is considered to be an easy problem for the haploid algorithm.

In the second test problem the fitness function oscillates every thirty generations between maximizing the decimal value of the binary string and minimizing it. This problem is considered to be hard for the haploid algorithm.

5  Results

The online and offline performances ``Goldberg, (1989)'' for test 1 are given in Fig.  and Fig.  respectively. In each case the diploid algorithm performs as well as the haploid one. The same for test 2 are given in Fig.  and Fig.  respectively. In each case, the better plot line belongs to the diploid algorithm.


No. of Steps: 1000,  Pop. Size: 250,  Cross Over Prob.: 0.9,
Mutation Prob.: 0.009,  No. of runs: 100, k=0.01 (diploid GA)


Fig. 1: Test1: Online perf. avg'd over 100 runs


Fig. 2: Test1: Offline perf. avg'd over 100 runs


Fig. 3:  Test2: Online perf. avg'd over 100 runs


Fig. 4: Test2: Offline perf. avg'd over 100 runs


 


The mean value of the best fitnesses taken over 100 runs, the standard deviation (sigma) of these best fitnesses, the best and worst fitnesses found in 100 runs and the average step for finding the best fitness are given in the below table for Test 1 and Test 2 cases respectively. The percent value gives the precentage of the standard deviation to the mean fitness value.
 

Tst1 & Tst2. Haploid Diploid Haploid Diploid
Mean Fit. 32 32 4294950144 4294966272
Sigma 0.0 0.0 34971.7 2636.2
Percent 0.0 0.0 0.000814 0.000061
Best Fit. 32 32 4294967296 4294967296
Worst Fit. 32 32 4294702080 4294958080
Avg. Step 29 30 147.3 259.5

The above results show that in the first test case, the diploid algorithm performs as well as the haploid one. However in the second case, which is considered hard for a haploid algorithm, the diploid one performs better.

6  Future Work

The proposed algorithm will be used for dynamically load balancing a network of computers.

Acknowledgements

This dissertation research is being conducted in the Istanbul Technical Univ., Institute of Science and Technology, under the supervision of Prof. Dr. Emre HARMANCI as the thesis advisor.

Bibliography


File translated from TEX by TTH, version 2.22.
On 28 May 1999, 14:04.