| ||||||||||||||||||||||
| ||||||||||||||||||||||
|
||||||||||||||||||||||
Neural Networks: Motivation, Theory and DANNBy Daniel Eaton
AbstractMotivation and theory of specific artificial neural networks are explained. The practicality of using neural networks to solve a basic machine vision problem is tested and discussed. Techniques for improving the neural network approach to such a problem are suggested.
Neural Networks: Motivation, Theory and DANNNeural networks. In most people’s minds, the term evokes unwarranted visions of top-secret Artificial Intelligence technology best reserved for science fiction. This is awfully ironic considering the mechanism allowing them to associate these two ideas, their brain, is a prime example of a neural network. Loosely, the expression ‘neural network’ denotes a collection of interconnected, interacting neurons, which can be biological or artificial. Remember Data, the resident android aboard the Starship Enterprise? His positronic brain was a sophisticated example of an artificial neural network [1]. Perhaps artificial neural networks were formerly useful only for bringing fictional robots to life, but today their value for real-world application has seen them implemented in several industries: the technology has been successfully used by video game creators, stock market analysts and United States postal workers alike. This report is broken down into two parts: (1), an explanation of the motivation, theory and technicalities behind neural network technology and (2), an application of a neural network to solve a machine vision problem.
Part I – Artificial Neural Network Motivation & TheoryArtificial neural networks are modeled on the animal brain, thus an understanding of basic neurobiology is essential in comprehending the motivation and technicalities of their design. The human brain consists of an estimated 1011 nerve cells, or neurons (Gurney, 1999). They interact via impulses of voltage change in their cell membranes. Biological neurons consist of incoming dendrites, a cell body and an outgoing axon. A simplified neuron is represented in Figure 1. The dendrites carry incoming voltage impulses from neighboring neurons to the cell body where they are integrated (summed up) and compared with a voltage threshold. If the summation exceeds this threshold, then the neuron will ‘fire’, causing the cell body to propagate an impulse along its axon to the dendrite of another neuron, otherwise, no impulse is generated. A neuron’s cell body will often have many simultaneous incoming signals that can have either an inhibitory or excitatory effect on the summation process. Interneuron connections are mediated by electrochemical links between axon and dendrite, called synapses, which can alter the impulses passing through them[2] (Campbell et al., 1999).
- Neurons inspire Units or Nodes; - Axons and Dendrites are replaced by Connections; - Synapses are remodeled by Connection Weights; - Voltage impulses loosely correspond to Signals. Refer to Figure 2 for a diagram of a node and its components. Akin to the aforementioned structural components, inter-nodal communication also parallels its biological origin: incoming signals from preceding nodes are multiplied by a connection weight and are summed up. The sum feeds into an appropriate activation function and the resulting signal is passed along the node’s forward connections to subsequent nodes.
ANNs can be constructed and trained in a manner so that they are able to correctly classify a high percentage of the patterns they were trained with, however, simple memorization is not where the true power of the ANN resides. Use of an ANN is complete overkill if a programmer simply wants one to recognize patterns it has been explicitly shown, after all, a computer’s storage can do this, flawlessly. The function of a well-trained ANN is to correctly classify patterns that it has never before been presented with (i.e. patterns not used during training). During network construction the programmer must be very careful in choosing the number of nodes per layer and maximum training error because inappropriate selections will lead to over/underfitting of training set pattern data. Poor generalization is a symptom of an overfitted network, whereas poor classification of both training set and never-before-seen patterns occurs with an underfitted network.
Part II – A Neural Approach to Machine Vision: DANNHaving introduced the basics of ANNs, I can present my own work with them. I have always been intrigued by computer graphics and artificial intelligence, so I decided to meld them together to produce a tangible programming problem. Machine vision programs are what the name suggests: programs that ‘see’ – that interpret digitized visual data and make decisions based on objects ‘seen’ therein. Such a task may seem trivial considering our innate ability to process visual information; however, the difference in human and classical computer image processing architectures makes even the invariant recognition of a simple geometric shape superimposed on an arbitrary image an exceptionally difficult task for a computer program. The specific machine vision problem I attempted to solve was: given a 19x19 pixel bitmap (each pixel is on or off), write a program capable of detecting the presence or absence of one filled circle with radius from 2 to 9 pixels. Analysis of potential bitmap compositions yields astounding numbers. Since there are 19*19=361 pixels that can either be on (black) or off (white), the number of potential image permutations is 2361 » 4.697x10108, 816 of which contain nothing but one circle. Considering the nature of the problem, it would be ridiculous for a programmer to attempt to write a program that checks an image against all 2361 cases! Conversely, if the programmer were to collect several 19x19 bitmaps with (and without) circles of the different radii that are representative of every possible configuration of the 19x19 bitmap, and train an appropriately constructed neural network with them, then the network should be capable of generalizing, thus extending its recognition of circles to the combinations it had not seen during training.
Playing the role of the smart programmer, I set out by building a simple, fully connectionist, feedforward ANN to solve the problem. I will now refer to it as Dan’s Artificial Neural Network (DANN). DANN consisted of five layers: mandatory input and output layers, in addition to three hidden layers. I chose three hidden layers because Hecht-Neilson (1990) showed that any function (in this case, a function which determines the presence of a circle) can be approximated with three hidden layers and for lack of experience, I was unsure of how few I could get away with. The input layer had 361 nodes corresponding to each pixel in the input bitmap. The output layer had 1 node, which after running a bitmap through the network was intended to indicate, with its net stimulation, whether or not there was a circle in the bitmap. The three hidden layers had 40, 20 and 10 nodes respectively. These selections were based on the fact that DANN was intended to extract features from its training set in order to generalize, or reduce the dimensionality of the problem. By utilizing Kolmogorov’s theorem (that any function of n variables can be represented by the superposition of
Figure 5 – Datasets (input patterns) used in training – a portion of the training set. DANN’s programming was written in C++ (DJGPP, an open source GNU compiler) and made use of the Standard Template Libraries. The forward/backpropagating and learning algorithms were based on mathematical descriptions, found in Skapura (1996). The code is highly object orientated and could be applied to create and train fully connectionist, feedforward ANNs of any proportion. Jasc Software’s Paint Shop Pro 6.01 was used to create the training set. During training, DANN converged to a total error of less than 0.1 in 23 epochs, on average. Training completed, I began testing DANN with the training datasets and new test datasets. The results were dismal: DANN could correctly classify 67% of previously seen (training set) datasets and only 39% of unseen datasets. DANN seemed to have particular trouble in classifying bitmaps that contained circles of the smallest allowable radii. This behavior can be rationalized in that the smaller radii circles have more possible positions within the bitmap. DANN also consistently misclassified bitmaps containing ovals or circle-like objects. Such inaccurate results are unacceptable; consequently, I generated a list of improvements that could be made. The most obvious would be to limit the radius size of the problem to greatly reduce the number of patterns with valid circles; however, this merely simplifies the problem, it does not solve DANN’s flaws, which are obviously at fault for the inadequate results. The training set should encompass more bitmap combinations. Ideally, Baum and Haussler’s (1989) suggested 154500 should be created, however, this is impractical. Instead, training set size could be incrementally increased to determine the minimum appropriate number of datasets. Another enhancement could be to modify DANN’s architecture to one that has been utilized in shift-invariant handwritten character recognition, named the Neocognitron (Fukuschima, 1980). This network uses hidden layers that are composed of multiple planes (a sample Neocognitron is illustrated in Figure 5). Nodes belonging to each plane all share the same outgoing connection weights and receive signals from equally sized sections of the prior layer’s planes. Thus, z nodes belonging to one of the first hidden layer’s planes of the modified DANN (NeoDANN) would acquire input from z, A x A sections of the 19x19 input image. After the input of an A x A section of the image to a z node, the A x A area of acquisition of input is shifted one pixel in one direction (up, down, left or right). The intent of each weight-sharing node in a plane is to detect the same feature as its neighbors in different locations of the input plane (or image). Each plane in a layer would then detect a different feature. If NeoDANN’s first hidden layer’s A x A acquiring region was square and of dimensions equal to the largest possible circle diameter then this layer would theoretically detect the largest possible circles. Subsequent hidden layers’ acquiring regions would decrease in dimensions by some discrete step in order to detect smaller circles.
). Kollias et al. (1991) reported having achieved successful invariant (translational, scale and rotational) character recognition by using a combination of Sigma-pi units and the Neocognitron.Machine vision problems are not as trivial to solve as initially perceived, even by attempting to utilize the generalization power of an ANN. DANN’s results demonstrate that more robust network architectures and techniques that are tailored for vision must be used in the place of a straightforward feedforward ANN. The suggested improvements to DANN would likely increase bitmap classification accuracy, nevertheless, there are many unmentioned changes that would be necessary to attain near perfect accuracy. DANN was definitely a success in that its failure showed neural technology still has a long ways to go.
Appendix A – Glossary of termsActivation function - A mathematical function applied to a node’s activation that computes the signal strength it outputs to subsequent nodes.Activation - The combined sum of a node’s incoming, weighted signals. Backpropagation - A supervised learning algorithm which uses data with associated target output to train an ANN. Connections - The paths signals follow between ANN nodes. Connection weights - Signals passing through a connection are multiplied by that connection’s weight. Dataset - See Input pattern Epoch - One iteration through the backpropagation algorithm (presentation of the entire training set once to the network). Feedforward ANN - An ANN architecture where signal flow is in one direction only. Generalization - A well trained ANN’s ability to correctly classify unseen input patterns by finding their similarities with training set patterns Input vector - See Input pattern Input pattern - An ANN’s input – there may be no pattern, per se. The pattern has as many elements as the network has input layer nodes. Maximum training error - Criteria for deciding when to stop training an ANN. Network architecture - The design of an ANN: the number of units and their pattern of connection. Net stimulation - The nodes activation, run through an activation function. Output - The signal a node passes on to subsequent layers (before being multiplied by a connection weight). In the output layer, the set of each node’s output is the ANN’s output for a particular input vector. Overfitting - Behavior displayed by ANNs when they are unable to generalize. Pattern - See Input Pattern. Training - Process allowing an ANN to learn. Training set - The set of input patterns used during network training. Each input pattern has an associated target output. Underfitting - Behavior displayed by ANNs when they are unable to generalize or classify already seen patterns Weights - See Connection weights.
Appendix B – A Mathematical Description of Feedforward, Fully Connectionist ANN Behavior [Gurney, 1997]Let wxy be the connection weight between node x, in one layer and node y, in the following layer.Let ax be the net activation in node x.
Let F(ax) be an activation function that accepts node x’s net activation as input – the function is applied to the net activation before node x propagates its signals onwards to the proceeding layer. Activation functions are varied; I used one of form
Let The input layer takes its activation values from the raw input vector element values without having the activation function applied to them so the nth input node’s activation will be nth input vector’s value. The activation of any non-input layer node, y, in the network then is:
. Net activations in the output layer are run through the activation function, however these nodes do not generate further forward signals.
ReferencesBaum, E., and D. Haussler. “What net size gives valid generalization?” Neural Computation 1.1. (1989): 151-160.Campbell, Neil A. Jane B. Reece, and Lawrence G. Mitchell. Biology. California: Benjamin/Cummings, 1999. Chester, Michael. Neural Network/A Tutorial. New Jersey: PTR Prentice Hall, 1993. Fukushima, K. “Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position.” Biological Cybernetics 36 (1980): 193-202. Gurney, Kevin. 1997. An Introduction to Neural Networks. London: UCL Press, 1997. Hecht-Neilson. Neurocomputing. New York: Addison Wesely, 1990. Kollias, Stefanos, Andreas Stafylopatis and Andreas Tirakis. “Performance of Higher Order Neural Networks in Invariant Recognition”. Neural Networks: Advances and Applications. Ed. E. Gelenbe. Holland: Elsevier Science Publishers, 1991. Liou, C. Y., and H. C. Yang. “Selective feature-to-feature adhesion for recognition of cursive hand printed characters.” IEEE Transactions on Pattern Analysis and Machine Intelligence 21.2. (1999): 184-191. Rumelhart, D. E., J. L. McClelland and The PDP Research Group. Parallel distributed processing. Chapter 2. MIT Press, 1986a. Rumelhart, D., Hinton, G.E. and Williams, R.J. “Learning internal representations by error propagation.” Parallell Distributed Processing: Explorations in the Microstructure of Cognition Chapter 8, 318-362. MIT Press. 1986b. Skapura, David M. Building Neural Networks. New York: ACM Press, 1996. Upadhyaya, B. R., and E. Eryurek. “Application of neural networks for sensor validation and plant monitoring.” Neural Technology 97. (1992): 179-176. Widrow, B. and M. Lehr. “30 years of adaptive neural networks: Perceptron, madaline and backpropagation.” Proceedings of the IEEE 78. (1990): 1415-1451.
[1] Italicized words and expressions are defined in Appendix I. [2] All presented neurobiology is extremely simplified. [3] BNN-like insofar that no ANN has come close to the complexity of a BNN [4] The feedforward ANN behavior is best described mathematically – refer to Appendix B. [5]This stems from the fact that it takes parameters to code P binary patterns.
Submitted: 06/01/2000 Article content copyright © Daniel Eaton, 2000.
|
|
|||||||||||||||||||||
All content copyright © 1998-2007, Generation5 unless otherwise noted.
- Privacy Policy - Legal - Terms of Use -