At the forefront of Artificial Intelligence
  Home Articles Reviews Interviews JDK Glossary Features Discussion Search
Home » Articles » Neural Networks » Beginner

Perceptrons

This essay follows from Introduction to Neural Networks, and we will look at the concepts of single and multilayer perceptrons as well as touch on the mathematics behind them.

The Perceptron

A perceptron is often thought of as the simplest type of neural network. It can be best thought of as a classifier, it can differentiate between sets of data and, more importantly, can classify an previously unseen data example into one of the learnt sets. The structure of a single perceptron is very simple. There are a number of inputs (xn), weights (wn), a bias (b), and an output. A simple schematic diagram for a perceptron is shown below.

For every input (including the bias) on the perceptron, there is a corresponding weight. To calculate the output of the perceptron, every input is multiplied by its corresponding weight. This weighted sum is then fed through a limiter function that determines the final output of the perceptron. The limiter function for a simple perceptron might be a stepped limiter.

So let's recap: the perceptron is calculated by summing all the inputs multipled by their weights plus the bias, then run through a limiter. In mathematical terms, these can be defined as:

The output of the limiter is the perceptron's output. For example, if the weighted sum is -2, the perceptron would return 0. Or if the sum was 3, the perceptron would return 1. If a perceptron returns 1, we say the perceptron has fired.

Now we need a way to actually teach the perceptron to classify datasets that are presented to it.

The Delta Rule

It is fairly self-evident that training the perceptron requires modifying its weights. The Delta rule is a simple learning rule that states the weights should be adjusted by the difference between the desired output and the actual output, or:

The Perceptron Convergence Theorem states that if a solution can be implemented on a perceptron, the learning rule will find the solution in a finite number of steps. Proof of this theorem can be found in Minsky and Papert's book, Perceptrons (1989).

Perceptron Java Applet

At this stage, open the Generation5 Perceptron Applet and play! The Perceptron Applet will allow you to specify two sets of data (red and green) then teach the perceptron to differentiate between the two sets. The perceptron applet will also show the line it has learnt to separate the datasets. Try different placing groups of data in different quadrants and placing the data quite close together (or far apart). The perceptron should successfully manage to find a line that separates the two datasets apart. Now try mixing the data up and look at the error message you receive.

Linearly Separable Only, Please

y = x1 AND NOT(x2)
x1 x2 y
0 0 0
0 1 0
1 0 1
1 1 0

Perceptrons can only classify data when the two classes can be divided by a straight line (or, more generally, a hyperplane if there are more than two inputs) — this is called linear separation. To explain the concept of linear separation further, let us look at the function shown to the right. If we ran this data through a perceptron, the weights could converge at 0 for the bias and 2, -2 for the inputs (there are a large number of potential solutions — this one just makes it easier to explain!). If we calculate the the weighted sum (or net value) we get:

Now, if x0 is plotted on the y-axis, and x1 on the x-axis, the equation can be reduced to x0 = x1. Look at the data plotted on the graph with the line that the perceptron has learnt:

x1 and (not x2)

So the perceptron correctly draws a line that divides the two groups of points. If you attempt to train a perceptron on data that is not linearly-separable, the perceptron's weights will not converge on a solution. Again, if you want to understand this a little further, have a play with the Perceptron Applet.

If we want to look at non-linearly separable data, we need another solution.

Multilayer Perceptrons

y = x1 XOR x2
x1 x2 y
0 0 0
0 1 1
1 0 1
1 1 0

A classic example of non-linearly separable data is the exclusive-OR (XOR) logic operator. XOR is the same as OR but is false when both inputs are true (thus exclusive). If you imagine plotting the data shown to the right, you can see how this could not be separated using a simple line.

The XOR problem can be solved by using three cleverly arranged perceptrons. The key is splitting up the XOR problem into three different parts—this requires a little boolean mathematics. Remember that '^' is AND, 'v' is OR, NOT is '¬':

XOR can be defined as:

  y = (x1 AND NOT(x2)) OR (NOT(x1) AND x2)

    - or -

  y = (x1 OR x2) AND NOT(x1 AND x2)

We can therefore split this into three subcomponents:

  y1 = x1 OR x2
  y2 = NOT(x1 AND x2)

  y = y1 AND y2

The problem is now broken down into three different linearly separable problems whereby the results from the first two equations can be used in a third. From the final three equations, you can see that the perceptrons would be connected like this:

Multiperceptron Network

To prove that it works, lets look at the weights (and thus the lines) that the perceptron converges at. The first perceptron balances at {1, 1, 0} (in this case, the last element is the bias), the second at {-1, -1, 2} and the final one at {1, 1, -1}. The three equations are as follows:

Remember that the final equation is the equation that covers the third perceptron that takes the output of the first two as its inputs, so this will be plotted on another graph. The two graphs look like this:

Graphs of Output for XOR Network

You can see that the layer 1 lines cut the graph into three parts. The centre region between the two lines is where the perceptron will generalize as a '1', with the other areas on and above/beneath the two other lines as '0'. Layer 2, you see how the third perceptron creates the final result. Notice, that the two lines in Layer 1 do not intersect at the origin, so the third perceptron never has to deal with it.

Setting the Weights
From our previous example, you can see how the multilayer architecture works and how perceptrons can be extended to learn a variety of other functions as long as you know the weights, or can split the problem into smaller, linearly-separable components that the delta learning rule can be applied to. What happens though if we cannot do this - how do we teach a neural network then?

There are tens of different ways to teach a neural network, but the most common one used is called back-propagation. Once you understand perceptrons and the theory behind them, feel free to check out Generation5's introductory essay.

Conclusion

Hopefully by now you have an understanding about what a perceptron is, how it works, how to train it, its limitations and how to remove those limitations by using multiple layers. For the enthusiastic reader, play with the applet a little more to gain an understanding about the strengths and limitations of the perceptron's classification abilities.

To look at something a little more complicated, take a look at Perceptrons being applied to optical character recognition or just play with the ONR applet.

Last Updated: 03/10/2004

Article content copyright © James Matthews, 2004.
 Article Toolbar
Print
BibTeX entry

Search

Latest News
- The Latest (03/04/2012)
- Generation5 10-year Anniversary (03/09/2008)
- New Generation5 Design! (09/04/2007)
- Happy New Year 2007 (02/01/2007)
- Where has Generation5 Gone?! (04/11/2005)

What's New?
- Back-propagation using the Generation5 JDK (07/04/2008)
- Hough Transforms (02/01/2008)
- Kohonen-based Image Analysis using the Generation5 JDK (11/12/2007)
- Modelling Bacterium using the JDK (19/03/2007)
- Modelling Bacterium using the JDK (19/03/2007)


All content copyright © 1998-2007, Generation5 unless otherwise noted.
- Privacy Policy - Legal - Terms of Use -