 
 


Firstly, please read the backpropagation essay. Also, if you want to make use of the programming classes, you must have a good working knowledge of C++. For the nonprogrammers out there, the programming part is just a small part at the end of the essay  you won't miss anything important! For the programmers  all major programming discussion takes place in another essay.
To demonstrate backpropagation were are going to look at three layer, 5 neuron (2 input, 2 hidden, 1 output) network (shown to the right). Before we start looking at the calculations, let us get some terminology straight. The inputs/outputs of the neurons are described as follows. Each layer has number, starting at 1 for the input layer. The inputs for each layer are indicated by x_{l1}(n) where l is the layer, and n is the neuron. So, for example, the inputs fed into the networks are x_{0}(0) and x_{0}(1) (marked red). Or the outputs from the hidden layer to the output layer are x_{2}(0) and x_{2}(1) (marked green). Weights are defined by w_{l}(f,n), where l is the layer, f is the neuron number is came from in the previous layer, and n is the number of the neuron itself. Note that when f = 0, it refers to the bias for the neuron. For example, the weight for the output of the second neuron in the input layer to the input of the first neuron in the hidden layer is w_{2}(2,1) (marked blue).
Weights and CalculationsFirstly, the network would be initialized, and given random weights. Let's assign these initial weights. The weights can be anything between 1 and 1.
Since backpropagation and training requires thousands of steps, we are obviously not going to go through it all, I will merely look at the first iteration that occurs. So, let us look at what would happen during training of (0,0). Firstly, the sum has to calculated, then run through the sigmoid function to limit it.
x_{1}(0) = 1 (bias) x_{1}(1) = 0 x_{1}(2) = 0 Neuron 1: (1 * 0.341232) + (0 * 0.129952) + (0 * 0.923123) = 0.341232 Neuron 2: (1 *0.115223) + (0 * 0.570345) + (0 * 0.328932) = 0.115223So, we now have the net (weighted sum) values of the two hidden neurons. Now, to run them through our hardlimiter function. x_{2}(1) = 1/(1+e^(0.341232)) = 0.584490 x_{2}(2) = 1/(1+e^( 0.115223)) = 0.471226We now have the outputs for the hidden layer. So, let us now do the same for the output layer. Using x_{2}(1) and x_{2}(2) as the inputs for the output layer we can make the following calculations: x_{2}(0) = 1 (bias) x_{2}(1) = 0.584490 x_{2}(2) = 0.471226 Net: (1 *0.993423) + (0.584490 * 0.164732) + (0.471226 * 0.752621) = 0.542484 Therefore, x_{3}(1) = 1/(1+e^(0.542484)) = 0.367610This is the value that the network would output. This is only half of the training process though, we now have to adjust all the weights to get the result closer to the one we want (0 in this case). So, lets calculate our deltas using the formulas discussed in the BP essay. We will first calculate the delta for the output layer: d_{3}(1) = x_{3}(1)(1  x_{3}(1))(d  x_{3}(1)) = 0.367610 * (1  0.367610)(0  0.367610) =0.085459Now that we have that, we can use it to propagate the error backwards: d_{2}(1) = x_{2}(1)(1  x_{2}(1))w_{3}(1,1)d_{3}(1) = 0.584490 * (1  0.584490)*(0.164732)*(0.085459) = 0.0034190 d_{2}(2) = 0.471226 * (1  0.471226)*(0.752621)*(0.085459) = 0.0160263That's all the deltas calculated for layers. Now to actually alter the weights  remember that the learning coefficient h is defined by the user and I have picked 0.5 to work with . Now, for some of them the weight change will be 0, because you are multiplying by the inputs, which in our case is 0. Therefore, I am only going to show the calculations for the ones that change: dw_{2}(0,1) = h*x_{1}(0)*d_{2}(1) = 0.5 * 1 * 0.0034190 = 0.017095 dw_{2}(1,1) = 0 dw_{2}(2,1) = 0So, these are the weight changes. You would add these to their respective weights, then run the entire process again on the next set of training data. Slowly, as the training data is fed in and the network in retrained a few thousand times, the network could balance out to values such as these:
With these outputs, you would get the follow results for XOR: 0 XOR 0 = 0.017622 0 XOR 1 = 0.981504 1 XOR 0 = 0.981491 1 XOR 1 = 0.022782Which, with a small amount of rounding, is the correct truth table. Now, for a brief look at the C++ class.
C++ Class CodeThe C++ class for this is very simple. You only have two functions you really care about, Train() and Run(). Train takes three floating point values, the two inputs and an expected value. The function returns the output of the net. Run() only takes the two inputs, and returns the output. Therefore, to apply the network the above example, you main() should look like:void main() { CBPNet bp; for (int i=0;i<BPM_ITER;i++) { bp.Train(0,0,0); bp.Train(0,1,1); bp.Train(1,0,1); bp.Train(1,1,0); } cout << "0,0 = " << bp.Run(0,0) << endl; cout << "0,1 = " << bp.Run(0,1) << endl; cout << "1,0 = " << bp.Run(1,0) << endl; cout << "1,1 = " << bp.Run(1,1) << endl; }BPM_ITER is defined as the number of iterations the network is to run for. Here is some sample output from the program: C:\Program Files\DevStudio\MyProjects\BPNet\Release>bpnet.exe 0,0 = 0.0494681 0,1 = 0.955633 1,0 = 0.942529 1,1 = 0.0433488To look into the class code, please see the CBPNet essay. You can download the code from here.
Submitted: 03/04/2001 Article content copyright © James Matthews, 2001.


All content copyright © 19982007, Generation5 unless otherwise noted.
 Privacy Policy  Legal  Terms of Use 