|
| Glossary |
•
|
Annealing is actually a term from metallurgy. If a metal is heated to a very high temperature, the atoms move about at high speeds. Yet, if they are cooled very slowly, they settle into patterns and structures, rendering the metal much stronger than before. This principle can be employed as an optimization technique in computer science. More specifically, in Artificial Intelligence, simulated annealing is used to help a neural network avoid local minima in its energy function.
The Theory
Simulated annealing takes a variety of forms, but the method presented here is one of the simplier ones. Simulated annealing basically involves perturbing the independent variables (in a neural network, the weights) by a random value, and keeping track of the value with the least error. The temperature takes on the form of the standard deviation used by the random number generator. At a high temperature, a large range is sampled, but as the temperature decreases so does the sampling range and as a result the error becomes smaller and smaller. Let us look at a few diagrams:

Let us suppose that 20 samplings are made per "temperature" - therefore, at the first temperature, the 20 sampling are rather dilute, and therefore finds a point only relatively low down. During the second temperature, the standard deviation is smaller, which condenses the samples more, allowing a lower point to be found. In the third temperature, with the standard deviation even smaller, a point with a very smaller error is found. For most purposes (definitely neural networks), a point which is this close to the real minimum is perfectly adequate.
While the example used is only 2-dimensional (one independent variable), simulated annealing works on N-dimensional problems just as well. To actually see this problem solved by simulated annealing, look at Generation5's Simulated Annealing Demonstrator.
The Algorithm
The algorithm has pretty much been explained in the theory section. In a list form, the algorithm could be written as follows:
- Set the independent variables to their expected values - this is used as the initial centering point.
- Set the temperature to a relatively high number.
- Perturb the independent variables.
- Calculate the new result (or error with an NN) with the new variables.
- If the new result is lower than the best, save the result.
- Repeat steps 3-5 n number of times.
- If an improvement has been made after the n number of iterations, set the centre point to be the best point.
- Reduce the temperature.
- Repeat steps 3-8 for t number of temperatures.
A few questions still remain. How should the temperature be reduced? The following formula is often employed for simulated annealing tasks using neural networks:

Stop and start are the stop and start temperatures, and t is the number of intermediate temperatures. Now, how do you set the parameters for simulated annealing? As a general rule, you should try and cut down on the number of temperatures - a number between 2 and 15 should always suffice. The starting and stopping temperatures greatly depend on your function, therefore experimentation and experience will yield good parameters.
When to use Simulated Annealing
Because of its highly iterative (thus computationally slow) nature, simulated annealing should be either completely avoided, combined with the gradient descent algorithm, or only used when speed is not of the essence. Yet, since simulated annealing can deal with highly dimensional minimization problems, or problems with many false minima, it should always be considered as an alternative when other methods fail.
Article content copyright © James Matthews, 2000.
|
|