At the forefront of Artificial Intelligence
  Home Articles Reviews Interviews JDK Glossary Features Discussion Search
Home » Articles » Neural Networks » Theory

Associative Neural Networks

Neural networks can be classified as either pattern-classifiers or pattern-associators. Pattern-classifiers take an input vector and output a value that can be used to classify it. For example, taking a sonar signal and classifying it as either a rock or a mine. Perceptrons and Adaline networks are examples of such classifiers. A pattern-associator network is a net that takes a given input vector and outputs another vector.

There are two types of associative networks, hetero-associative (HA) and auto-associative (AA). Hetero-associative takes an input vector and outputs a completely different vector. For example, an HA-net could be used to take a sound file and output the text that it represents (or the closest learned text). Auto-associative networks take input and output the same vector. How is this useful? An AA-net could learn various vectors, then when a corrupted vector came in, it would be corrected by the net. This can be used in image-recognition of partial or corrupted ("noisy") images. It is auto-associative networks that this essay will focus on, or more specifically the Hopfield network.

The Hopfield Network

The Hopfield is an auto-associative network often used (or derivates of it) in monochrome image recognition. When introduced in 1982, it had a rather new concept - outputs would feedback to the inputs. The network consists of one layer, whose unit outputs feedback to all the other inputs but its own. The architecture looks like this:

One of the huge advantages of this network is that weights are calculated with one calculation (mathematically-speaking). In order to see how the Hopfield network works, and how the mathematics behind it came about, we must temporarily digress and talk about the learning matrix.

The Learning Matrix

The Learning Matrix was an electronic device that would learn a desired output, given a certain input it would do this by adjusting the weights within the matrix using a Hebbian learning rule, when given the input and output vectors. The reason this is important is because it introduces the matrix mathematic behind the Hopfield network. The learning matrix operation could be summarized mathematically as:

Where [X] is the input matrix (group of patterns to be learnt), [Y] is the output matrix, and [W] is the weights. I'll do a simple example. To keep the matrices small, I'll assume that there are only two patterns with four bits.

To make calculations a lot simpler, we'll make h = 1. Now, to calculate the weights that will give us our output given the input:

So, we now have our weights calculated already for all patterns! Note that when you actually run a pattern through the association matrix ([W]), the result will not be the identical desired output matrix - it will have to be passed through a hard limiter to yield the correct results.

The learning matrix can also cope with slightly corrupt inputs. Note, that if 1101 is corrupted to 1111, the correct result can still be obtained:

If 2 is the threshold value, the correct result is obtained: [0 1 1 1]. So, where does the Hopfield network come into all of this? The Hopfield network was based upon the learning matrix, with two changes - the Hopfield network uses bipolar values (-1, +1) instead of binary (0,1), and the feedback property of the network described earlier.

Now, back to the Network...

The response of a neuron is pretty much the same as in any network - if the net (weighted sum) is above a threshold, the neuron fires (in the Hopfield network, this is a +1), if it is below is outputs -1. Hopfield never stated what happens when net equals the threshold, so we assume that it stays the same.

Weights are calculated in much the same way as the learning matrix with a small difference. Since the outputs are the same as the inputs we don't need a [Y]. This is not all, though, since we do have to ensure that the output of a neuron cannot be associated with one of its inputs. This can be simply done by subtracting the unit matrix multiplied by the number of patterns (called [P]). Therefore, the final equation can be written:

We also have to calculate thresholds. We just assume there is an additional neuron, which is used as an offset, and is permanently set a 1. We calculate them in the same way as weights, except since the output is always stuck at 1, the formula equates to:

Therefore, the thresholds in our example would be (remember, the binary values have now been converted into bipolar form):

We now know how both the weights and thresholds are calculated, the architecture and where it all came from. I have not yet laid down the modus operandi of the net, to understand this, we have to look at the concept of energy in neural networks.

Energy

The term energy when applied to neural networks really has nothing to do with energy is the physics sense. It is not quantified according to any known mean of energy as the real-world knows it. Energy in a neuron is a measure of how much stimulation is required to make the neuron fire. For example, if an input to a neuron has a huge weight (for example, 100) the neuron will most definitely fire, it does not take much energy at all to make this neuron fire. If the weight is a tiny number (for example, 0.0001) then it will take a lot of energy (or a lot of inputs) to make the neuron fire. Also, if a weight is negative, it acts as an inhibitor - thus, more energy is required. The energy on the entire network is just the sum of all the neuronal energies. Now, as in a lot of 'engineering' problems, we want to minimize the energy required to get a result. The exact same applies in this analogy.

Now, often in problems that neural networks are applied to there is a large problem space. Imagine that problem space as a series of hills and valleys, representing the energy of all solutions. Our goal is to get to the lowest point in the search space.

Now, to calculate the energy of a neuron, all other neurons have to stay constant (since their values are used). This is also true when updating the neuron values - therefore, it is important that the neurons are all updating asynchronously (not at the same time). In my opinion, this slightly defeats the purpose of a neural network, since they're supposed to be parallel, nevertheless, the Hopfield network does its job well.

So, most Hopfield network algorithms simply select a neuron at random and update it accordingly. After many iterations, the network should stabilize to one of the patterns it has learnt. It minimizes the energy and converges to a pattern its weights are set for. The network does have a drawback though. It will often converge to the local minimum, not the global minimum.

Imagine the series of hills and valleys described before. There are many hills and many valleys, each one of the valleys probably represents a 'decent' solution but the deepest valley is the best solution. The Hopfield network will often converge to its local minimum (the valley nearest to its starting point on a hill), not the best possible solution. Therefore, if you have multiple patterns that are alike, the network may not find the best possible pattern. For example, below is a group of three images, the left and middle of which were fed into Generation5's HIR2 program, it was then given a corrupted version of the middle pattern - it outputted the right, which is neither the left or the middle. In fact, you could run the network a bit more on the pattern to the right - the the network would move about from one local minimum to another.

Conclusion

This essay just looked at one type of associative net - but any network that maps input patterns to output patterns is an associative net. Auto-associative map input to itself, whereas hetero-associative output a different pattern to the one inputted. Hopfield networks are one of the most noted associative networks because of it feedback properties.

If you are interested in Hopfield networks, look at the Hir2 program I wrote. There are both Win95 and DOS versions, and the source code is supplied with it. It is all in C++, with a CHopfield class that encompasses every discussed above.

Last Updated: 11/12/1999

Article content copyright © James Matthews, 1999.
 Article Toolbar
Print
BibTeX entry

Search

Latest News
- The Latest (03/04/2012)
- Generation5 10-year Anniversary (03/09/2008)
- New Generation5 Design! (09/04/2007)
- Happy New Year 2007 (02/01/2007)
- Where has Generation5 Gone?! (04/11/2005)

What's New?
- Back-propagation using the Generation5 JDK (07/04/2008)
- Hough Transforms (02/01/2008)
- Kohonen-based Image Analysis using the Generation5 JDK (11/12/2007)
- Modelling Bacterium using the JDK (19/03/2007)
- Modelling Bacterium using the JDK (19/03/2007)


All content copyright © 1998-2007, Generation5 unless otherwise noted.
- Privacy Policy - Legal - Terms of Use -