At the forefront of Artificial Intelligence
  Home Articles Reviews Interviews JDK Glossary Features Discussion Search
Home » Articles » Neural Networks » Beginner

Notes on Neural Network Learning and Training

1.0 Introduction 

Neural Network (NN) could be define as an interconnected of simple processing element whose functionality is based on the biological neuron. Biological neuron (Figure 1) is a unique piece of equipment that carries information or a bit of knowledge and transfers to other neuron in a chain of networks. Artificial Neuron imitates these functions and their unique process of learning. Basically, biological neuron has three types of components called dendrites, soma and axon. Dendrites are the sensitive part of neuron that receive signal from other neuron. Soma calculates and sums the signals and transmitted to other cells through axon.



Figure 1: Biological Neuron

Simple neuron (Figure 2) introduced by McCulloch and Pitts in 1940s, consists of input layer, activation function, and output layer. Input layer receive input signal from external environment (or other neuron). Activation function is the neuron internal states that calculates and sum the input signals. The signals are then transmitted to output layer. The input layer, activation function and output layer in artificial neuron are similar to the function of dendrites, soma and axon in biological neuron.

 


Figure 2: McCulloch-Pitts Neuron Model


2.0 Learning in Neural Network

Assume we have n input units, Xi,…,Xn with input signals x1,…,xn. When the network receive the signals (xi) from input units (Xi), the net input to output (y_inj) is calculated by summing the weighted input signals ( ). The matrix multiplication method for calculating the net input is shown in the equation below.

y_inj =
where wij is the connection weights of input unit xi and output unit yj.

The network output (yj) is calculated using the activation function f(x). In which yj = f(x), where x is y_inj. The computed weight from the training is stored and will become the information or knowledge for the future application.

NN can be divided into three architectures, namely single layer, multilayer network and competitive layer. The number of layers in a net is defined based on the number of interconnected weight in the neuron. Single layer network consists only one layer of connection weights. Whereas, multilayer networks consists of more than one layer of connection weights. The network also consists of additional layer called hidden layer. Multilayer networks can be used to solve more complicated problems compared to single layer network. Both of the network are also called feedforward network where the signal flows from the input units to the output units in a forward direction. The competitive layer network, for example the Recurrent Networks is a feedback network where there are closed-loop signal from a unit back to itself. 

Learning Mechanisms

NNs learning algorithms can be divided into two main groups that are supervised (or Associative learning) and unsupervised (Self-Organisation) learning. Many supervised and unsupervised learning NN have been invented. Some are listed in NN FAQ (frequently-ask-question) and discussion group web page, but many other are not. 

Supervised Learning
Supervised learning learns based on the target value or the desired outputs. During training the network tries to match the outputs with the desired target values. This method has two sub varieties called auto-associative and hetero-associative. In auto-associative learning, the target values are the same as the inputs, whereas in hetero-associative learning, the targets are generally different from the inputs. 

One of the most commonly used supervised NN model is backpropagation network that uses backpropagation learning algorithm. Backpropagation (or backprop) algorithm is one of the well-known algorithms in neural networks. Backpropagation algorithm has been popularized by Rumelhart, Hinton, and Williams in 1980s as a euphemism for generalized delta rule. Backpropagation of errors or generalized delta rule is a decent method to minimize the total squared error of the output computed by the net (Fausett, 1994). The introduction of backprop algorithm has overcome the drawback of previous NN algorithm in 1970s where single layer perceptron fail to solve a simple XOR problem. 

Unsupervised Learning
Unsupervised learning method is not given any target value. A desired output of the network is unknown. During training the network performs some kind of data compression such as dimensionality reduction or clustering. The network learns the distribution of patterns and makes a classification of that pattern where, similar patterns are assigned to the same output cluster. Kohonen network is the best example of unsupervised learning network. According to Sarle (1997) Kohonen network refers to three types of networks that are Vector Quantization, Self-Organizing Map and Learning Vector Quantization. 


3.0 Training the Network

Training the network is time consuming. It usually learns after several epochs, depending on how large the network is. Thus, large network required more training time compared to the smaller one. Basically, the network is trained for several epochs and stopped after reaching the maximum epoch. For the same reason minimum error tolerance is used provided that the differences between network output and known outcome is less than the specified value (see for example Pofahl et al., 1998). We could also stop the training after the network meet certain stopping criteria. 

During training the network might learn too much. This problem is referred to as overfitting. Overfitting is a critical problem in most all standard NNs architecture. Furthermore, NNs and other AI machine learning models are prone to overfitting (Lawrence et al., 1997). One of the solutions is early stopping (Sarle, 1995), but this approach need more critical intention as this problem is harder than expected (Lawrence et al., 1997). The stopping criteria is also another issue to consider in preventing overfitting (Prechelt, 1998). Hence, for this problem during training, validation set is used instead of training data set. After a few epochs the network is tested with the validation data. The training is stopped as soon as the error on validation set increases rapidly higher than the last time it was checked (Prechelt, 1998). Figure 3 shows that the training should stop at time t when validation error starts to increase. 




Figure 3: Training and validation curve

Discussion and Conclusion


Constructing a program for Neural Network is not a difficult task. Basically, it was only several steps of algorithms that are easily followed even by novice practitioners. However, preparing the network for training is a difficult task since the network dealing with a large amount of data. Another problem is when to stop the training? Over training could cause memorization where the network might simply memorize the data patterns and might fail to recognize other set of patterns. Thus, early stopping is recommended to ensure that the network learn accordingly.


References


Fausett, L. (1994). Fundamentals of Neural Network: Architectures, Algorithms and Applications. Englewood Cliffs: Prentice Hall. 

Sarle, W. S. (1997). Neurak Network FAQ, part 1 of 7: Introduction. Periodic posting to the Usenet newsgroup comp.ai.neural-nets, URL: ftp://ftp.sas.com/pub/neural/FAQ.html Downloaded on 30 Nov. 1999.

Pofahl, W. E., Walczak, S. M., Rhone, E., and Izenberg, S. D. (1998). Use of an Artificial Neural Network to Predict Length of Stay in Acute Pancreatitis. American Surgeon, Sep98, Vol. 64 Issue 9, (pp: 868 – 872) 

Lawrence, S., Giles, C. L., and Tsoi, A. C. (1997). Lessons in Neural Network Training: Training May be Harder than Expected. Proceedings of the Fourteenth National Conference on Artificial Intelligence, AAAI-97, (pp. 540-545), Menlo Park, California: AAAI Press.

Sarle, W. (1995). Stopped Training and Other Remedies for Overfitting. Proceedings of the 27th Symposium on the Interface of Computing Science and Statistics, (pp. 352-360). Retrieved March 18, 2002 from World Wide Web: ftp://ftp.sas.com/pub/neural/ 

Prechelt, L. (1998). Early Stopping-but when? Neural Networks: Tricks of the trade, (pp. 55-69). Retrieved March 28, 2002 from World Wide Web: http://wwwipd.ira.uka.de/~prechelt/Biblio/

 

Submitted: 14/03/2004

Article content copyright © Wan Hussain Wan Ishak, 2004.
 Article Toolbar
Print
BibTeX entry

Search

Latest News
- Generation5 10-year Anniversary (03/09/2008)
- New Generation5 Design! (09/04/2007)
- Happy New Year 2007 (02/01/2007)
- Where has Generation5 Gone?! (04/11/2005)
- NeuroEvolving Robotic Operatives (NERO) (25/06/2005)

What's New?
- Back-propagation using the Generation5 JDK (07/04/2008)
- Hough Transforms (02/01/2008)
- Kohonen-based Image Analysis using the Generation5 JDK (11/12/2007)
- Modelling Bacterium using the JDK (19/03/2007)
- Modelling Bacterium using the JDK (19/03/2007)


All content copyright © 1998-2007, Generation5 unless otherwise noted.
- Privacy Policy - Legal - Terms of Use -