The most basic method of training a neural network is trial and error. If
the network isn't behaving the way it should, change the weighting of a
random link by a random amount. If the accuracy of the network declines,
undo the change and make a different one. It takes time, but the trial
and error method does produce results. The neural network to the left is a
simple one that could be constructed using such a trial and error method.
The task is to mirror the status of the input row onto the output row. To
do this it would have to invent a binary to decimal encoding and decoding
scheme with which it could pass the information through the bottle-neck of
the two neurons in the centre.
Unfortunately, the number of possible weightings rises exponentially as one adds new neurons, making large general-purpose neural nets impossible to construct using trial and error methods. In the early 1980s two researchers, David Rumelhart and David Parker, independently rediscovered an old calculus-based learning algorithm. The back-propagation algorithm compares the result that was obtained with the result that was expected. It then uses this information to systematically modify the weights throughout the neural network. This training takes only a fraction of the time that trial and error method take. It can also be reliably used to train networks on only a portion of the data, since it makes inferences. The resulting networks are often correctly configured to answer problems that they have never been specifically trained on.
As useful as back-propagation is, there are often easier ways to train a network. For specific-purpose networks, such as pattern recognition, the operation of the network and the training method are relatively easy to observe even though the networks might have hundreds of neurons.
In its simplest form, pattern recognition uses one analog neuron for each
pattern to be recognized. All the neurons share the entire input space.
Assume that the two neuron network to the right has been trained to
recognize light and dark. The 2x2 grid of pixels forms the input layer.
If answer #1 is the `light' output, then all of its dendrites would be
excitatory (if a pixel is on, then increase the neuron's score, otherwise
do nothing). By the same token, all of answer #2's dendrites would be
inhibitory (if a pixel is off then increase the neuron's score, otherwise
do nothing). This simply boils down to a count of how many pixels are on
and how many are off. To determine if it is light or dark, pick the
answer neuron with the highest score.
This example is a complete waste of a neural net, but it does demonstrate
the principle. The output neurons of this type of network do not require
thresholds because what matters is highest output. A more useful example
would be a 5x7 pixel grid that could recognize letters. One could have 26
neurons that all share the same 35 pixel input space. Each neuron would
compute the probability of the inputs matching its character. The grid
on the left is configured to output the probability that the input is the
letter A. Each tile in the grid represents a link to the A
neuron.
Training these pattern recognition networks is simple. Draw the desired
pattern and select the neuron that should learn that pattern. For each
active pixel, add one to the weight of the link between the pixel and the
neuron in training. Subtract one from the weight of each link between an
inactive pixel and the neuron in training. To avoid ingraining a pattern
beyond all hope of modification, it is wise to set a limit on the absolute
weights. The Recog character recognition program
uses this technique to learn handwriting. It is a Windows application
that can quickly learn to distinguish a dozen or so symbols. The network to the right was created by Recog to recognize the digits from 0-9.
A more sophisticated method of pattern recognition would involve several neural nets working in parallel, each looking for a particular feature such as "horizontal line at the top", or "enclosed space near the bottom". The results of these feature detectors would then be fed into another net that would match the best pattern. This is closer to the way humans recognize patterns. The drawback is the complexity of the learning scheme. The average child takes a year to learn the alpha-numeric system to competence.
![]()
Previous: Medium Independence.
Next: Overview.
Last modified: September 21, 1998
By: Neil Fraser
(neil@vv.carleton.ca)