 
 


This essay follows from Introduction to Neural Networks, and we will look at the concepts of single and multilayer perceptrons as well as touch on the mathematics behind them. The PerceptronA perceptron is often thought of as the simplest type of neural network. It can be best thought of as a classifier, it can differentiate between sets of data and, more importantly, can classify an previously unseen data example into one of the learnt sets. The structure of a single perceptron is very simple. There are a number of inputs (x_{n}), weights (w_{n}), a bias (b), and an output. A simple schematic diagram for a perceptron is shown below. For every input (including the bias) on the perceptron, there is a corresponding weight. To calculate the output of the perceptron, every input is multiplied by its corresponding weight. This weighted sum is then fed through a limiter function that determines the final output of the perceptron. The limiter function for a simple perceptron might be a stepped limiter. So let's recap: the perceptron is calculated by summing all the inputs multipled by their weights plus the bias, then run through a limiter. In mathematical terms, these can be defined as: The output of the limiter is the perceptron's output. For example, if the weighted sum is 2, the perceptron would return 0. Or if the sum was 3, the perceptron would return 1. If a perceptron returns 1, we say the perceptron has fired. Now we need a way to actually teach the perceptron to classify datasets that are presented to it. The Delta RuleIt is fairly selfevident that training the perceptron requires modifying its weights. The Delta rule is a simple learning rule that states the weights should be adjusted by the difference between the desired output and the actual output, or: The Perceptron Convergence Theorem states that if a solution can be implemented on a perceptron, the learning rule will find the solution in a finite number of steps. Proof of this theorem can be found in Minsky and Papert's book, Perceptrons (1989). Perceptron Java AppletAt this stage, open the Generation5 Perceptron Applet and play! The Perceptron Applet will allow you to specify two sets of data (red and green) then teach the perceptron to differentiate between the two sets. The perceptron applet will also show the line it has learnt to separate the datasets. Try different placing groups of data in different quadrants and placing the data quite close together (or far apart). The perceptron should successfully manage to find a line that separates the two datasets apart. Now try mixing the data up and look at the error message you receive. Linearly Separable Only, Please
Perceptrons can only classify data when the two classes can be divided by a straight line (or, more generally, a hyperplane if there are more than two inputs) — this is called linear separation. To explain the concept of linear separation further, let us look at the function shown to the right. If we ran this data through a perceptron, the weights could converge at 0 for the bias and 2, 2 for the inputs (there are a large number of potential solutions — this one just makes it easier to explain!). If we calculate the the weighted sum (or net value) we get: Now, if x_{0} is plotted on the yaxis, and x_{1} on the xaxis, the equation can be reduced to x_{0} = x_{1}. Look at the data plotted on the graph with the line that the perceptron has learnt:
x_{1} and (not x_{2})
So the perceptron correctly draws a line that divides the two groups of points. If you attempt to train a perceptron on data that is not linearlyseparable, the perceptron's weights will not converge on a solution. Again, if you want to understand this a little further, have a play with the Perceptron Applet. If we want to look at nonlinearly separable data, we need another solution. Multilayer Perceptrons
A classic example of nonlinearly separable data is the exclusiveOR (XOR) logic operator. XOR is the same as OR but is false when both inputs are true (thus exclusive). If you imagine plotting the data shown to the right, you can see how this could not be separated using a simple line. The XOR problem can be solved by using three cleverly arranged perceptrons. The key is splitting up the XOR problem into three different parts—this requires a little boolean mathematics. Remember that '^' is AND, 'v' is OR, NOT is '¬': XOR can be defined as: y = (x_{1} AND NOT(x_{2})) OR (NOT(x_{1}) AND x_{2})  or  y = (x_{1} OR x_{2}) AND NOT(x_{1} AND x_{2}) We can therefore split this into three subcomponents: y_{1} = x_{1} OR x_{2} y_{2} = NOT(x_{1} AND x_{2}) y = y_{1} AND y_{2} The problem is now broken down into three different linearly separable problems whereby the results from the first two equations can be used in a third. From the final three equations, you can see that the perceptrons would be connected like this:
Multiperceptron Network
To prove that it works, lets look at the weights (and thus the lines) that the perceptron converges at. The first perceptron balances at {1, 1, 0} (in this case, the last element is the bias), the second at {1, 1, 2} and the final one at {1, 1, 1}. The three equations are as follows: Remember that the final equation is the equation that covers the third perceptron that takes the output of the first two as its inputs, so this will be plotted on another graph. The two graphs look like this:
Graphs of Output for XOR Network
You can see that the layer 1 lines cut the graph into three parts. The centre region between the two lines is where the perceptron will generalize as a '1', with the other areas on and above/beneath the two other lines as '0'. Layer 2, you see how the third perceptron creates the final result. Notice, that the two lines in Layer 1 do not intersect at the origin, so the third perceptron never has to deal with it. Setting the Weights There are tens of different ways to teach a neural network, but the most common one used is called backpropagation. Once you understand perceptrons and the theory behind them, feel free to check out Generation5's introductory essay. ConclusionHopefully by now you have an understanding about what a perceptron is, how it works, how to train it, its limitations and how to remove those limitations by using multiple layers. For the enthusiastic reader, play with the applet a little more to gain an understanding about the strengths and limitations of the perceptron's classification abilities. To look at something a little more complicated, take a look at Perceptrons being applied to optical character recognition or just play with the ONR applet.
Last Updated: 03/10/2004 Article content copyright © James Matthews, 2004.


All content copyright © 19982007, Generation5 unless otherwise noted.
 Privacy Policy  Legal  Terms of Use 