At the forefront of Artificial Intelligence
  Home Articles Reviews Interviews JDK Glossary Features Discussion Search
Home » Articles » Genetic Algorithms » Applications/Code

Solving the Travelling Sales Man Problem using a Genetic Algorithm

By Andy Thomas

Abstract

This essay discusses some issues which arise in solving the Travelling Salesman Problem (TSP) using a genetic algorithm (GA). Two approaches are described. I've personally implemented both, failing with the first, realising my mistakes and succeeding with the second approach. A comparison of the two approaches is intended to introduce the reader to the importance of taking the dynamics of a given problem into account when designing a genetic algorithm to solve it.

It is assumed that the reader has a basic knowledge of genetic algorithm concepts. The following is suggested as an excellent introduction and reference for this subject:

Genetic Algorithms in Search, Optimization, and Machine Learning.
David E. Goldberg
ISBN 0-201-15767-5

Introduction

Background to Genetic Algorithms

Evolution in the natural world displays a remarkable problem solving ability, as demonstrated by the fact that a myriad species have evolved on Earth, demonstrating a diverse range of survival strategies. It would therefore not be unreasonable to deduce that a problem solving strategy, inspired by the mechanics of natural selection and genetics, may prove highly effective in solving certain classes of problems

Genetic algorithms emulate the mechanics of natural selection by a process of randomised data exchange. In this way they are able to solve of range of difficult problems which cannot be tackled by other approaches. The fact that they are able to search in a randomised, yet directed manner, allows them to reproduce some of the innovative capabilities of natural systems

Because genetic algorithms were inspired by the behaviour of natural systems, the terminology used to describe them is a mix from both biological and computer fields. A genetic algorithm manipulates strings of information, usually called chromosomes. These encode potential solutions to a given problem. Chromosomes are evaluated and assigned a score (fitness value) in terms of how well they solve the given problem according to criteria defined by the programmer. These fitness values are used as a probability of survival during a round of reproduction. New chromosomes are produced by combining two (or more) parent chromosomes. This process is designed to lead to a succession of fitter offspring, each encoding better solutions, until an acceptably good solution is found.

The Travelling Salesman Problem

In this hypothetical problem a salesman must visit a number cities. The distances between the cities are known. How does the salesman determine the order in which to visit each city just once in turn, as to minimise the total distance travelled?

To gain an appreciation of the problem, consider 10 cities. The total number of possible routes is 10 factorial or 3,628,800. In this case it is perfectly feasible to create a brute force algorithm to examine each route and yield the shortest one. [The number of possible solutions to a problem is often referred to as the problem 'search space'.] For a less trivial case of 100 cities, the search space rises to 100 factorial or around 9.3e157. If we assume that a computer can examine a million routes per second, it will take around 3e144 years to examine them all. Needless to say that to examine each route in turn is not feasible in any acceptable time scale.

In a real-world route planning problem, the determination of the shortest possible route may not be required. It may suffice to simply find an acceptably short route. A genetic algorithm is ideally suited such cases where the optimum solution is not required, provided a near-optimum solution can be delivered in a short time scale.

Genetic Algorithms Are Not a Panacea

Genetic algorithms cannot just be 'thrown' at a poorly understood problem in the hope that it will be solved in some magical way, not at least if you don't wish your efforts to be in vain. While GAs can often be successfully applied to analytically intractable tasks, it is usually necessary to understand some of the dynamics of the problem at hand in order to design the GA judiciously. Perhaps the two most important design questions of any GA project are those of the fitness evaluation function and reproduction scheme chosen, as they must be mutually compatible.

To illustrate this, in the following section we consider what happens in a naive approach to solving a 100 city Travelling Salesman Problem.

A Doomed Approach to TSP

We need to find a short route between 100 cities. The geographical location of the cities are known, hence we can easily calculate the distance of a route between them using trigonometry. We describe the route between the cities in a computer program with an array of 8 bit integers, where the array has 100 elements. The array index is our route order, the contents of each array element is a city number. Hence, if we wrote the following pseudo code:

a[] = {5, 64, 23, 8 ...

we would mean that we visit city 5 first, followed by 64, 23 and so on. This is going to be how we encode the 'chromosomes' or trial solutions in the GA.

Evaluation Function

We write an evaluation function for our problem which takes a trial solution and returns a fitness value. The shorter the route the higher the fitness value. However, we only want solutions which describe a route visiting all cities. We realise this and write the fitness function to 'punish', with low fitness values, those chromosomes which contain multiple visits to any city. If we didn't do this, the GA will find that the shortest route is the one which takes you nowhere (i.e. visits the same city 100 times). In addition, we punish chromosomes which express city numbers which don't exist--any allele value above 99 in our case.

Simple Crossover Reproduction

Simple chromosome 'crossover' is chosen for our reproduction scheme because it is the simplest. This is where two parent chromosomes are combined by choosing a random crossing site along the chromosome length. This can be best explained with the following 10 allele example:

A = 614829x0735
B = 729361x5480

where 'x' marks the crossing site. Two offspring chromosomes are generated by crossing over A and B at 'x' to yield:

A' = 6148295480
B' = 7293610735

Finally, we generate an initial population of 200 random chromosomes and run the program. We are astounded as the program produces nothing of use.

Post-Mortem

Although the GA may yield a viable solution, i.e. one in which all cities are visited just once, no effective attempt will be made to progress toward decreasingly shorter routes. The reason is quite simple. The crossover reproduction scheme we have used kills the viability of the offspring. Consider the 10 city chromosome examples A and B. Both these are viable because city visits described in each are unique. Now look at the offspring A' and B', and note how some cities are visited twice.

The net result of this approach is a random walk through a massive problem space (approx. 6.7e240) with little chance of finding many viable solutions, never mind good ones.

A Successful Approach

In the previous section we identified that the order in which data is encoded in the chromosome is important for TSP, and that a simple crossover reproduction mechanism is not suitable in these circumstances.

Partially Matched Crossover

With this in mind, we need a scheme analogous to simple crossover, but one which preserves the solution viability while allowing the exchange of ordering information. One such scheme is Partially Matched Crossover (PMX). In this scheme, a crossing region is chosen by selecting two crossing sites. We return to our initial 10 allele example, but now mark two crossing sites:

A = 614x829x0735
B = 729x361x5480

Crossover is performed in the centre region to yield the following transitionary offspring:

A~ = HH4x361x07H5
B~ = 7HHx829x54H0

The 'H' positions are holes in the transitionary offspring left by the deliberate removal of alleles which would otherwise be replicated in the crossing region. These holes are filled by cross-referencing with the parent of the alternate chromosome. This can best be explained with a step by step example:

  1. Take the first hole in A~ at index 0, the missing allele value at this position is 6.
  2. Search along B (the alternate parent of A~) and match the first allele encountered with a value of 6. This occurs in B at index 4.
  3. Fill H[0] in ~A with the allele found at A[4], a value of 2.
Applying this process to the remaining holes, we obtain the following PMX offspring:

A' = 2943610785
' = 7618295430

Inversion

Mutation is often used in GAs as a way of adding random variation to the evolving population, preventing a total loss of diversity. In our case it's not so useful, however, as a mutation has a high probability of resulting in a non-viable city order. We may still wish to apply a small mutation rate, provided of course we account for non-viable city orders in the evaluation function. However, an alternative to mutation is perhaps more appropriate.

Inversion refers to the randomised exchange of alleles in chromosomes. This can be made to occur with a given (usually small) probability, in much the same way as mutation. As an example, let's consider inversion on the B' chromosome:

B' = 7618295430

Allele index 2 is selected for inversion. The value is simply exchanged with the allele at an additional randomly selected index, let's say index 7, as follows:

Bi = 7648295130

Hence, allele positions 2 and 7 have been exchanged.

Results

I've previously tried this approach to the Travelling Salesman Problem and ran it successfully with city counts of between 100 and 1000.

In the way of illustration, two outputs taken from a 100 city problem are shown below (1000 cities would be too cluttered to show). Figure 1a shows an initial randomly ordered route between 100 randomly positioned points. Figure 1b, after around 10,000 generations.


Figure 1a : Initial Random Route
Figure 1b : After 10099 Generations


Andy Thomas 2001
andyt@alpha-india5.com

Submitted: 06/07/2001

Article content copyright © Andy Thomas, 2001.
 Article Toolbar
Print
BibTeX entry

Search

Latest News
- The Latest (03/04/2012)
- Generation5 10-year Anniversary (03/09/2008)
- New Generation5 Design! (09/04/2007)
- Happy New Year 2007 (02/01/2007)
- Where has Generation5 Gone?! (04/11/2005)

What's New?
- Back-propagation using the Generation5 JDK (07/04/2008)
- Hough Transforms (02/01/2008)
- Kohonen-based Image Analysis using the Generation5 JDK (11/12/2007)
- Modelling Bacterium using the JDK (19/03/2007)
- Modelling Bacterium using the JDK (19/03/2007)


All content copyright © 1998-2007, Generation5 unless otherwise noted.
- Privacy Policy - Legal - Terms of Use -