At the forefront of Artificial Intelligence
  Home Articles Reviews Interviews JDK Glossary Features Discussion Search
Home » Articles » Neural Networks » Theory

Back-propagation for the Uninitiated

For newcomers to neural-networks, the words 'back-propagation' signal the point from which everything becomes a blur! I didn't understand the concept of BP until it was slapped quite plainly in front of me, with all the details laid out. So, for those of you having a little trouble grasping the concept, hopefully this essay will clear it up.

I will assume that you have read the Perceptron essay, and you have a small understanding of basic calculus and terminology (don't worry too much if you don't, but it'll help!). I made a BP case-study for you to look at after you have finished reading this essay. Finally, for those of you that might not understand some of the formulas presented here, you might want to check the Mathematics for AI Beginners.


Backpropagation

In a single-layer network, each neuron adjusts its weights according to what output was expected of it, and the output it gave. This can be mathematically expressed by the Perceptron Delta Rule:

Remember that w is the array of weights, x is the array of inputs. The Perceptron Learning Rule is of no use though when you extend the network to multiple layers to account for non-linearly separable problems. Why is this the case? When adjusting a weight in the network, you have to be able to tell what effect this will have on the overall effect of the network. To do this, you have to look at the derivative of the error function with respect to that weight.

Our biggest problem at the moment is that the hard-limiter function often used for the perceptron is non-continuous, thus non-differentiable (see this essay for details). One of the more popular alternatives function used with back-propagation nets is the Sigmoid (logistic) function. The equation is show to the right, and the graph for it below.

You can see that the function plateaus out at 0 and 1 on the y-axis, and it also crosses the y-axis at 0.5, making the function 'neat'. Furthermore, it neatly differentiates to f'(x) = x(1-x). With all of this in mind, let's revisit the perceptron learning rule with a little alteration:

w, x are as above, and n is defined as the learning rate. yi and di are the actual and desired outputs, respectively. So, if we are using the Sigmoid activation function, we can rewrite the equation for calculating the deltas for the output layer as:

We now have the method to calculate the net sum of each neuron in the network and the method of calculating delta for the output layer, we need to calculate delta for the hidden layers.

We have to know the effect on the output of the neuron if a weight is to change. Therefore, we need to know the derivative of the error with respect to that weight. Again, for sake of simplicity I am skipping over the mathematical derivation of the delta rule. It has been proven that for neuron q in hidden layer p, delta is:

Notice where the 'back' part of 'Back-propagation' comes in? At the end of the equation is δp+1. Each delta value for hidden layers require that the delta value for the layer after it be calculated. In a 3-layer network, the output-layer delta is first calculated using the first delta formula shown. This value is then used to calculate for all remaining hidden layers using the formula shown above. Hence the name 'back-propagation', as the error from the output layer is slowly propagated backwards through the network.

Back-propagation is not (in my opinion) a concept easily grasped without some hands-on experience. Therefore, if you are a programmer, first look at the code I wrote, then experiment with a BP network of your own. Try different architectures, with multiple outputs to get yourself familiar with the formulas and algorithms necessary.

I really cannot stress enough how important it is for people to try back-propagation themselves. This essay, along with the equations can be confusing and vague if it isn't paired with practical effort. Read the BP case study that steps through the back-propagation algorithm very slowly on a simple XOR network. Then read this essay again, and try to program a BP-enabled network yourself, or modify Generation5's CBPNet (C++) class.

Last Updated: 17/08/2002

Article content copyright © James Matthews, 2002.
 Article Toolbar
Print
BibTeX entry

Search

Latest News
- Generation5 10-year Anniversary (03/09/2008)
- New Generation5 Design! (09/04/2007)
- Happy New Year 2007 (02/01/2007)
- Where has Generation5 Gone?! (04/11/2005)
- NeuroEvolving Robotic Operatives (NERO) (25/06/2005)

What's New?
- Back-propagation using the Generation5 JDK (07/04/2008)
- Hough Transforms (02/01/2008)
- Kohonen-based Image Analysis using the Generation5 JDK (11/12/2007)
- Modelling Bacterium using the JDK (19/03/2007)
- Modelling Bacterium using the JDK (19/03/2007)


All content copyright © 1998-2007, Generation5 unless otherwise noted.
- Privacy Policy - Legal - Terms of Use -