At the forefront of Artificial Intelligence
  Home Articles Reviews Interviews JDK Glossary Features Discussion Search
Home » Articles » Neural Networks » Theory

Neural Networks: Motivation, Theory and DANN

By Daniel Eaton

Abstract

Motivation and theory of specific artificial neural networks are explained. The practicality of using neural networks to solve a basic machine vision problem is tested and discussed. Techniques for improving the neural network approach to such a problem are suggested.

Neural Networks: Motivation, Theory and DANN

Neural networks. In most people’s minds, the term evokes unwarranted visions of top-secret Artificial Intelligence technology best reserved for science fiction. This is awfully ironic considering the mechanism allowing them to associate these two ideas, their brain, is a prime example of a neural network. Loosely, the expression ‘neural network’ denotes a collection of interconnected, interacting neurons, which can be biological or artificial. Remember Data, the resident android aboard the Starship Enterprise? His positronic brain was a sophisticated example of an artificial neural network [1]. Perhaps artificial neural networks were formerly useful only for bringing fictional robots to life, but today their value for real-world application has seen them implemented in several industries: the technology has been successfully used by video game creators, stock market analysts and United States postal workers alike. This report is broken down into two parts: (1), an explanation of the motivation, theory and technicalities behind neural network technology and (2), an application of a neural network to solve a machine vision problem.

Part I – Artificial Neural Network Motivation & Theory

Artificial neural networks are modeled on the animal brain, thus an understanding of basic neurobiology is essential in comprehending the motivation and technicalities of their design. The human brain consists of an estimated 1011 nerve cells, or neurons (Gurney, 1999). They interact via impulses of voltage change in their cell membranes. Biological neurons consist of incoming dendrites, a cell body and an outgoing axon. A simplified neuron is represented in Figure 1. The dendrites carry incoming voltage impulses from neighboring neurons to the cell body where they are integrated (summed up) and compared with a voltage threshold. If the summation exceeds this threshold, then the neuron will ‘fire’, causing the cell body to propagate an impulse along its axon to the dendrite of another neuron, otherwise, no impulse is generated. A neuron’s cell body will often have many simultaneous incoming signals that can have either an inhibitory or excitatory effect on the summation process. Interneuron connections are mediated by electrochemical links between axon and dendrite, called synapses, which can alter the impulses passing through them[2] (Campbell et al., 1999).


Figure 1 – A highly stylized biological neuron
[Source: Gurney, 1997]

An artificial neural network (ANN) is an attempt to emulate a biological neural network (BNN) by organizing biological-analogous artificial components into BNN-like[3] architectures. ANNs can be implemented in hardware or software, however I have chosen not to deal with hardware ANNs. ANN to BNN component analogs are as follows:

- Neurons inspire Units or Nodes; - Axons and Dendrites are replaced by Connections; - Synapses are remodeled by Connection Weights; - Voltage impulses loosely correspond to Signals.

Refer to Figure 2 for a diagram of a node and its components. Akin to the aforementioned structural components, inter-nodal communication also parallels its biological origin: incoming signals from preceding nodes are multiplied by a connection weight and are summed up. The sum feeds into an appropriate activation function and the resulting signal is passed along the node’s forward connections to subsequent nodes.


Figure 2 – A node. Arrows denote signals and crossed-circles represent connection weights. Arrow width and circle radius are indicative of signal and weight strength, respectively.
[Source: adapted from Gurney, 1997]

A system of connected nodes constitutes an artificial neural network. Arranging the nodes in different configurations yields distinct ANNs with characteristic properties and efficacies. The most versatile and commonly used network architecture is the connectionist, feedforward ANN (Skapura, 1997). In its most fundamental form, nodes in a feedforward ANN are arranged by layers: there is an input layer, zero or more hidden layers, and an output layer (see Figure 3). Each hidden layer consists of an appropriate number of nodes. Considerable research has developed mathematically based nodal selection rules that take into account the problem’s nature (Hecht-Nielson, 1990), the amount of available training data (Upadhyaya and Eryurek, 1992) and the desired network generalization ability (Widrow and Lehr, 1990). The feedforward ANN’s name is derived from its behavior: when the input layer’s nodes are stimulated with incoming data, they propagate weighted signals forward through the hidden layers to the output layer where the output node activations are analyzed to see how the network has reacted to the input data[4]. The output is usually intended to classify the input data; for example, Liou and Yang (1999) wrote an ANN to verify the validity of handwritten signatures. The network takes as input a scanned image of your supposed signature, which after being propagated through the layers produces outputs that indicate the legitimacy or fraudulency of the signature. In a fully connectionist feedforward ANN, each node in a layer passes its signal on to every node belonging to the following layer.


Figure 3 – A fully connectionist, feedforward ANN with one hidden layer. The input layer has n nodes, and the output layer has m.
[Source: Skapura, 1996]

Neural networks are useless unless they are well trained. Network learning is accomplished by experience. In BNNs, synaptic strengths change in response to certain stimuli so that the network (brain) associates correct physical or psychological reactions with the applied stimulus (Chester, 1997). The artificial analog to these synaptic adjustments is the modification of connection weights in such a manner that the network evolves towards accurate classification of input patterns. In other words, the network’s intelligence resides within its connection weights. ANN learning (training) paradigms differ in application and efficiency, however they generally fall under two categories: supervised and unsupervised learning. Unsupervised learning involves an ANN’s independent analysis of input data, looking for patterns and symmetries therein without being explicitly told what output should be produced. This branch of learning has undergone considerable study, nonetheless it lacks broad real-world application. Supervised learning concerns itself with comparing actual output for input data with specified target output. A common algorithm that operates under supervised conditions is backpropagation, conceived by Rumelhart et al. (1986). The algorithm entails running each input pattern in a training set through a feedforward ANN, calculating the error incurred by the difference in actual and target network output, backpropagating (propagating errors in the same manner as forward signal propagation but in the opposite direction) and adjusting weights according to the error; this process is repeated iteratively until the total error across the training set is below a specified maximum. Each such iteration is known as an epoch.

ANNs can be constructed and trained in a manner so that they are able to correctly classify a high percentage of the patterns they were trained with, however, simple memorization is not where the true power of the ANN resides. Use of an ANN is complete overkill if a programmer simply wants one to recognize patterns it has been explicitly shown, after all, a computer’s storage can do this, flawlessly. The function of a well-trained ANN is to correctly classify patterns that it has never before been presented with (i.e. patterns not used during training). During network construction the programmer must be very careful in choosing the number of nodes per layer and maximum training error because inappropriate selections will lead to over/underfitting of training set pattern data. Poor generalization is a symptom of an overfitted network, whereas poor classification of both training set and never-before-seen patterns occurs with an underfitted network.

Part II – A Neural Approach to Machine Vision: DANN

Having introduced the basics of ANNs, I can present my own work with them. I have always been intrigued by computer graphics and artificial intelligence, so I decided to meld them together to produce a tangible programming problem. Machine vision programs are what the name suggests: programs that ‘see’ – that interpret digitized visual data and make decisions based on objects ‘seen’ therein. Such a task may seem trivial considering our innate ability to process visual information; however, the difference in human and classical computer image processing architectures makes even the invariant recognition of a simple geometric shape superimposed on an arbitrary image an exceptionally difficult task for a computer program. The specific machine vision problem I attempted to solve was: given a 19x19 pixel bitmap (each pixel is on or off), write a program capable of detecting the presence or absence of one filled circle with radius from 2 to 9 pixels. Analysis of potential bitmap compositions yields astounding numbers. Since there are 19*19=361 pixels that can either be on (black) or off (white), the number of potential image permutations is 2361 » 4.697x10108, 816 of which contain nothing but one circle. Considering the nature of the problem, it would be ridiculous for a programmer to attempt to write a program that checks an image against all 2361 cases! Conversely, if the programmer were to collect several 19x19 bitmaps with (and without) circles of the different radii that are representative of every possible configuration of the 19x19 bitmap, and train an appropriately constructed neural network with them, then the network should be capable of generalizing, thus extending its recognition of circles to the combinations it had not seen during training.

Playing the role of the smart programmer, I set out by building a simple, fully connectionist, feedforward ANN to solve the problem. I will now refer to it as Dan’s Artificial Neural Network (DANN). DANN consisted of five layers: mandatory input and output layers, in addition to three hidden layers. I chose three hidden layers because Hecht-Neilson (1990) showed that any function (in this case, a function which determines the presence of a circle) can be approximated with three hidden layers and for lack of experience, I was unsure of how few I could get away with. The input layer had 361 nodes corresponding to each pixel in the input bitmap. The output layer had 1 node, which after running a bitmap through the network was intended to indicate, with its net stimulation, whether or not there was a circle in the bitmap. The three hidden layers had 40, 20 and 10 nodes respectively. These selections were based on the fact that DANN was intended to extract features from its training set in order to generalize, or reduce the dimensionality of the problem. By utilizing Kolmogorov’s theorem (that any function of n variables can be represented by the superposition of functions), Hecht-Neilson (1990) proved that an ANN should never possess a number of hidden units more than twice the number of its input layer units; DANN’s 70 hidden units and 361 inputs satisfies this condition. Upadhyaya and Eryurek (1992) theorized the maximum efficient number of connection weights (and consequently the number of connections) in an ANN to be , with n elements per input dataset and p possible patterns to be classified[5]. Since a fully connectionist ANN’s nodes are connected to each node in the proceeding layer, DANN has 15450 of a maximum connection weights. Note that is the number of nodes able to encode every pattern possible; to achieve gene ralization, the actual number of nodes should be significantly less. See Figure 4 for a diagram of DANN.


Figure 4 – DANN and an example input pattern.
[Source: adapted from Skapura, 1996]

After designing DANN’s architecture, I began planning and constructing the training set. Baum and Haussler (1989) showed that optimal generalization-memorization could be attained with a number of training datasets approximately equal to the number of weights in the network multiplied by the inverse of the error limit. DANN’s training error maximum was 0.1, therefore I should have theoretically had 154500*1/0.1, or 154500 datasets in the training set. Since I had to construct each training set bitmap one by one, this was out of the question; instead, I created 75 datasets and hoped for the best. Circles of varying radius and location placed on 19x19 bitmaps with blank backgrounds constituted the first 50 datasets, which were followed by one totally blank bitmap and 24 bitmaps with random ‘noise’ (randomly placed on/black pixels) or other geometric shapes. Refer to Figure 4 for sample training datasets.


Figure 5 – Datasets (input patterns) used in training – a portion of the training set.

DANN’s programming was written in C++ (DJGPP, an open source GNU compiler) and made use of the Standard Template Libraries. The forward/backpropagating and learning algorithms were based on mathematical descriptions, found in Skapura (1996). The code is highly object orientated and could be applied to create and train fully connectionist, feedforward ANNs of any proportion. Jasc Software’s Paint Shop Pro 6.01 was used to create the training set.

During training, DANN converged to a total error of less than 0.1 in 23 epochs, on average. Training completed, I began testing DANN with the training datasets and new test datasets. The results were dismal: DANN could correctly classify 67% of previously seen (training set) datasets and only 39% of unseen datasets. DANN seemed to have particular trouble in classifying bitmaps that contained circles of the smallest allowable radii. This behavior can be rationalized in that the smaller radii circles have more possible positions within the bitmap. DANN also consistently misclassified bitmaps containing ovals or circle-like objects.

Such inaccurate results are unacceptable; consequently, I generated a list of improvements that could be made. The most obvious would be to limit the radius size of the problem to greatly reduce the number of patterns with valid circles; however, this merely simplifies the problem, it does not solve DANN’s flaws, which are obviously at fault for the inadequate results. The training set should encompass more bitmap combinations. Ideally, Baum and Haussler’s (1989) suggested 154500 should be created, however, this is impractical. Instead, training set size could be incrementally increased to determine the minimum appropriate number of datasets. Another enhancement could be to modify DANN’s architecture to one that has been utilized in shift-invariant handwritten character recognition, named the Neocognitron (Fukuschima, 1980). This network uses hidden layers that are composed of multiple planes (a sample Neocognitron is illustrated in Figure 5). Nodes belonging to each plane all share the same outgoing connection weights and receive signals from equally sized sections of the prior layer’s planes. Thus, z nodes belonging to one of the first hidden layer’s planes of the modified DANN (NeoDANN) would acquire input from z, A x A sections of the 19x19 input image. After the input of an A x A section of the image to a z node, the A x A area of acquisition of input is shifted one pixel in one direction (up, down, left or right). The intent of each weight-sharing node in a plane is to detect the same feature as its neighbors in different locations of the input plane (or image). Each plane in a layer would then detect a different feature. If NeoDANN’s first hidden layer’s A x A acquiring region was square and of dimensions equal to the largest possible circle diameter then this layer would theoretically detect the largest possible circles. Subsequent hidden layers’ acquiring regions would decrease in dimensions by some discrete step in order to detect smaller circles.


Figure 6 – An example Neocognitron. This network was used to recognize handwritten numeric characters. The input and output layers consisted of 64 and 10 nodes respectively. It had two hidden layers: the first had 2, 4x4 planes (16 nodes per plane) and the second had 4, 2x2 planes (4 nodes per plane).
[Source: Kollias et al., 1991]

The final improvement that would evolve superior results from the problem would be further modification to NeoDANN. Synapses, particularly those that deal with image processing, do not conform to the simplistic model earlier described (Gurney, 1997). In such cases, two or more axons will join near a dendrite, resulting in presynaptic inhibition – one axon inhibits the other axon’s signal. Rumelhart et al. (1986) introduced the Sigma-pi node as an artificial model of this biological phenomenon (the name ‘Sigma-pi’ is derived from the nodes’ activation function which resembles ). Kollias et al. (1991) reported having achieved successful invariant (translational, scale and rotational) character recognition by using a combination of Sigma-pi units and the Neocognitron.

Machine vision problems are not as trivial to solve as initially perceived, even by attempting to utilize the generalization power of an ANN. DANN’s results demonstrate that more robust network architectures and techniques that are tailored for vision must be used in the place of a straightforward feedforward ANN. The suggested improvements to DANN would likely increase bitmap classification accuracy, nevertheless, there are many unmentioned changes that would be necessary to attain near perfect accuracy. DANN was definitely a success in that its failure showed neural technology still has a long ways to go.

Appendix A – Glossary of terms

Activation function - A mathematical function applied to a node’s activation that computes the signal strength it outputs to subsequent nodes.
Activation - The combined sum of a node’s incoming, weighted signals.
Backpropagation - A supervised learning algorithm which uses data with associated target output to train an ANN.
Connections - The paths signals follow between ANN nodes.
Connection weights - Signals passing through a connection are multiplied by that connection’s weight.
Dataset - See Input pattern
Epoch - One iteration through the backpropagation algorithm (presentation of the entire training set once to the network).
Feedforward ANN - An ANN architecture where signal flow is in one direction only.
Generalization - A well trained ANN’s ability to correctly classify unseen input patterns by finding their similarities with training set patterns
Input vector - See Input pattern
Input pattern - An ANN’s input – there may be no pattern, per se. The pattern has as many elements as the network has input layer nodes.
Maximum training error - Criteria for deciding when to stop training an ANN.
Network architecture - The design of an ANN: the number of units and their pattern of connection.
Net stimulation - The nodes activation, run through an activation function.
Output - The signal a node passes on to subsequent layers (before being multiplied by a connection weight). In the output layer, the set of each node’s output is the ANN’s output for a particular input vector.
Overfitting - Behavior displayed by ANNs when they are unable to generalize.
Pattern - See Input Pattern.
Training - Process allowing an ANN to learn.
Training set - The set of input patterns used during network training. Each input pattern has an associated target output.
Underfitting - Behavior displayed by ANNs when they are unable to generalize or classify already seen patterns
Weights - See Connection weights.

Appendix B – A Mathematical Description of Feedforward, Fully Connectionist ANN Behavior [Gurney, 1997]

Let wxy be the connection weight between node x, in one layer and node y, in the following layer.

Let ax be the net activation in node x.

Let F(ax) be an activation function that accepts node x’s net activation as input – the function is applied to the net activation before node x propagates its signals onwards to the proceeding layer. Activation functions are varied; I used one of form .

Let be an input vector (‘network stimuli data’) that exists in space where n equals the number of input layer nodes.

The input layer takes its activation values from the raw input vector element values without having the activation function applied to them so the nth input node’s activation will be nth input vector’s value. The activation of any non-input layer node, y, in the network then is:

where s is the number of nodes in the previous layer. The signal y passes on along forward-bound connections is simply, . Net activations in the output layer are run through the activation function, however these nodes do not generate further forward signals.

References

Baum, E., and D. Haussler. “What net size gives valid generalization?” Neural Computation 1.1. (1989): 151-160.

Campbell, Neil A. Jane B. Reece, and Lawrence G. Mitchell. Biology. California: Benjamin/Cummings, 1999.

Chester, Michael. Neural Network/A Tutorial. New Jersey: PTR Prentice Hall, 1993.

Fukushima, K. “Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position.” Biological Cybernetics 36 (1980): 193-202.

Gurney, Kevin. 1997. An Introduction to Neural Networks. London: UCL Press, 1997.

Hecht-Neilson. Neurocomputing. New York: Addison Wesely, 1990.

Kollias, Stefanos, Andreas Stafylopatis and Andreas Tirakis. “Performance of Higher Order Neural Networks in Invariant Recognition”. Neural Networks: Advances and Applications. Ed. E. Gelenbe. Holland: Elsevier Science Publishers, 1991.

Liou, C. Y., and H. C. Yang. “Selective feature-to-feature adhesion for recognition of cursive hand printed characters.” IEEE Transactions on Pattern Analysis and Machine Intelligence 21.2. (1999): 184-191.

Rumelhart, D. E., J. L. McClelland and The PDP Research Group. Parallel distributed processing. Chapter 2. MIT Press, 1986a.

Rumelhart, D., Hinton, G.E. and Williams, R.J. “Learning internal representations by error propagation.” Parallell Distributed Processing: Explorations in the Microstructure of Cognition Chapter 8, 318-362. MIT Press. 1986b.

Skapura, David M. Building Neural Networks. New York: ACM Press, 1996.

Upadhyaya, B. R., and E. Eryurek. “Application of neural networks for sensor validation and plant monitoring.” Neural Technology 97. (1992): 179-176.

Widrow, B. and M. Lehr. “30 years of adaptive neural networks: Perceptron, madaline and backpropagation.” Proceedings of the IEEE 78. (1990): 1415-1451.


[1] Italicized words and expressions are defined in Appendix I.
[2] All presented neurobiology is extremely simplified.
[3] BNN-like insofar that no ANN has come close to the complexity of a BNN
[4] The feedforward ANN behavior is best described mathematically – refer to Appendix B.
[5]This stems from the fact that it takes parameters to code P binary patterns.

Download code in Word Format (70Kb)

Submitted: 06/01/2000

Article content copyright © Daniel Eaton, 2000.
 Article Toolbar
Print
BibTeX entry

Search

Latest News
- Generation5 10-year Anniversary (03/09/2008)
- New Generation5 Design! (09/04/2007)
- Happy New Year 2007 (02/01/2007)
- Where has Generation5 Gone?! (04/11/2005)
- NeuroEvolving Robotic Operatives (NERO) (25/06/2005)

What's New?
- Back-propagation using the Generation5 JDK (07/04/2008)
- Hough Transforms (02/01/2008)
- Kohonen-based Image Analysis using the Generation5 JDK (11/12/2007)
- Modelling Bacterium using the JDK (19/03/2007)
- Modelling Bacterium using the JDK (19/03/2007)


All content copyright © 1998-2007, Generation5 unless otherwise noted.
- Privacy Policy - Legal - Terms of Use -