Training Neural Networks.

Neural networks can be explicitly programmed to perform a task by manually creating the topology and then setting the weights of each link and threshold. However, this by-passes one of the unique strengths of neural nets: the ability to program themselves. [Decimal to binary to decimal converter] The most basic method of training a neural network is trial and error. If the network isn't behaving the way it should, change the weighting of a random link by a random amount. If the accuracy of the network declines, undo the change and make a different one. It takes time, but the trial and error method does produce results. The neural network to the left is a simple one that could be constructed using such a trial and error method. The task is to mirror the status of the input row onto the output row. To do this it would have to invent a binary to decimal encoding and decoding scheme with which it could pass the information through the bottle-neck of the two neurons in the centre.

Unfortunately, the number of possible weightings rises exponentially as one adds new neurons, making large general-purpose neural nets impossible to construct using trial and error methods. In the early 1980s two researchers, David Rumelhart and David Parker, independently rediscovered an old calculus-based learning algorithm. The back-propagation algorithm compares the result that was obtained with the result that was expected. It then uses this information to systematically modify the weights throughout the neural network. This training takes only a fraction of the time that trial and error method take. It can also be reliably used to train networks on only a portion of the data, since it makes inferences. The resulting networks are often correctly configured to answer problems that they have never been specifically trained on.

As useful as back-propagation is, there are often easier ways to train a network. For specific-purpose networks, such as pattern recognition, the operation of the network and the training method are relatively easy to observe even though the networks might have hundreds of neurons.

[Competitive learning scheme] In its simplest form, pattern recognition uses one analog neuron for each pattern to be recognized. All the neurons share the entire input space. Assume that the two neuron network to the right has been trained to recognize light and dark. The 2x2 grid of pixels forms the input layer. If answer #1 is the `light' output, then all of its dendrites would be excitatory (if a pixel is on, then increase the neuron's score, otherwise do nothing). By the same token, all of answer #2's dendrites would be inhibitory (if a pixel is off then increase the neuron's score, otherwise do nothing). This simply boils down to a count of how many pixels are on and how many are off. To determine if it is light or dark, pick the answer neuron with the highest score.

[Neural weightings for letter `A'] This example is a complete waste of a neural net, but it does demonstrate the principle. The output neurons of this type of network do not require thresholds because what matters is highest output. A more useful example would be a 5x7 pixel grid that could recognize letters. One could have 26 neurons that all share the same 35 pixel input space. Each neuron would compute the probability of the inputs matching its character. The grid on the left is configured to output the probability that the input is the letter A. Each tile in the grid represents a link to the A neuron.

[Network that recognizes digits from 0 to 9] Training these pattern recognition networks is simple. Draw the desired pattern and select the neuron that should learn that pattern. For each active pixel, add one to the weight of the link between the pixel and the neuron in training. Subtract one from the weight of each link between an inactive pixel and the neuron in training. To avoid ingraining a pattern beyond all hope of modification, it is wise to set a limit on the absolute weights. The Recog character recognition program uses this technique to learn handwriting. It is a Windows application that can quickly learn to distinguish a dozen or so symbols. The network to the right was created by Recog to recognize the digits from 0-9.

A more sophisticated method of pattern recognition would involve several neural nets working in parallel, each looking for a particular feature such as "horizontal line at the top", or "enclosed space near the bottom". The results of these feature detectors would then be fed into another net that would match the best pattern. This is closer to the way humans recognize patterns. The drawback is the complexity of the learning scheme. The average child takes a year to learn the alpha-numeric system to competence.

--------------------------------

Previous: Medium Independence.
Next: Overview.

Last modified: September 21, 1998
By: Neil Fraser (neil@vv.carleton.ca)