Rosenblatt [Rose61] created many variations of the perceptron. One of the simplest was a single-layer network whose weights and biases could be trained to produce a correct target vector when presented with the corresponding input vector. The training technique used is called the perceptron learning rule. The perceptron generated great interest due to its ability to generalize from its training vectors and learn from initially randomly distributed connections. Perceptrons are especially suited for simple problems in pattern classification. They are fast and reliable networks for the problems they can solve. In addition, an understanding of the operations of the perceptron provides a good basis for understanding more complex networks.

The discussion of perceptrons in this section is necessarily brief. For a more thorough discussion, see Chapter 4, “Perceptron Learning Rule,” of [HDB1996], which discusses the use of multiple layers of perceptrons to solve more difficult problems beyond the capability of one layer.

A perceptron neuron, which uses the hard-limit transfer function `hardlim`

, is shown below.

Each external input is weighted with an appropriate weight
*w*_{1j}, and the sum of the weighted
inputs is sent to the hard-limit transfer function, which also has an input of 1
transmitted to it through the bias. The hard-limit transfer function, which returns
a 0 or a 1, is shown below.

The perceptron neuron produces a 1 if the net input into the transfer function is equal to or greater than 0; otherwise it produces a 0.

The hard-limit transfer function gives a perceptron the ability to classify input vectors by dividing the input space into two regions.
Specifically, outputs will be 0 if the net input *n* is less than
0, or 1 if the net input *n* is 0 or greater. The following figure
show the input space of a two-input hard limit neuron with the weights
*w*_{1,1} = −1,
*w*_{1,2} = 1 and a bias
*b* = 1.

Two classification regions are formed by the *decision boundary* line L at

**Wp** + *b* =
0. This line is perpendicular to the weight matrix **W** and shifted according to the bias *b*. Input
vectors above and to the left of the line L will result in a net input greater than
0 and, therefore, cause the hard-limit neuron to output a 1. Input vectors below and
to the right of the line L cause the neuron to output 0. You can pick weight and
bias values to orient and move the dividing line so as to classify the input space
as desired.

Hard-limit neurons without a bias will always have a classification line going through the origin. Adding a bias allows the neuron to solve problems where the two sets of input vectors are not located on different sides of the origin. The bias allows the decision boundary to be shifted away from the origin, as shown in the plot above.

You might want to run the example program `nnd4db`

. With it you
can move a decision boundary around, pick new inputs to classify, and see how the
repeated application of the learning rule yields a network that does classify the
input vectors properly.

The perceptron network consists of a single layer of `S`

perceptron neurons connected to *R* inputs through a set of weights
*w _{i,j}*, as shown below in two forms.
As before, the network indices

The perceptron learning rule described shortly is capable of training only a
single layer. Thus only one-layer networks are considered here. This restriction
places limitations on the computation a perceptron can perform. The types of
problems that perceptrons are capable of solving are discussed in Limitations and Cautions*.*

You can create a perceptron with the following:

net = perceptron; net = configure(net,P,T);

where input arguments are as follows:

`P`

is an R-by-Q matrix of Q input vectors of R elements each.`T`

is an S-by-Q matrix of Q target vectors of S elements each.

Commonly, the `hardlim`

function is used in
perceptrons, so it is the default.

The following commands create a perceptron network with a single one-element input vector with the values 0 and 2, and one neuron with outputs that can be either 0 or 1:

P = [0 2]; T = [0 1]; net = perceptron; net = configure(net,P,T);

You can see what network has been created by executing the following command:

inputweights = net.inputweights{1,1}

which yields

inputweights = delays: 0 initFcn: 'initzero' learn: true learnFcn: 'learnp' learnParam: (none) size: [1 1] weightFcn: 'dotprod' weightParam: (none) userdata: (your custom info)

The default learning function is `learnp`

, which is discussed in Perceptron Learning Rule (learnp). The
net input to the `hardlim`

transfer function is
`dotprod`

, which generates the product
of the input vector and weight matrix and adds the bias to compute the net
input.

The default initialization function `initzero`

is used to set the initial values of the weights to zero.

Similarly,

biases = net.biases{1}

gives

biases = initFcn: 'initzero' learn: 1 learnFcn: 'learnp' learnParam: [] size: 1 userdata: [1x1 struct]

You can see that the default initialization for the bias is also 0.

Perceptrons are trained on examples of desired behavior. The desired behavior can be summarized by a set of input, output pairs

$${p}_{1}{t}_{1},{p}_{2}{t}_{1},\dots ,{p}_{Q}{t}_{Q}$$

where **p** is an input to the network and **t** is the corresponding correct (target) output. The
objective is to reduce the error **e**, which is the
difference **t** − **a**
between the neuron response **a** and the target vector
**t**. The perceptron learning rule `learnp`

calculates desired changes to the perceptron's weights and
biases, given an input vector **p** and the associated
error **e**. The target vector **t** must contain values of either 0 or 1, because perceptrons (with
`hardlim`

transfer functions) can only
output these values.

Each time `learnp`

is executed, the perceptron
has a better chance of producing the correct outputs. The perceptron rule is proven
to converge on a solution in a finite number of iterations if a solution
exists.

If a bias is not used, `learnp`

works to find a solution by
altering only the weight vector **w** to point toward
input vectors to be classified as 1 and away from vectors to be classified as 0.
This results in a decision boundary that is perpendicular to **w** and that properly classifies the input vectors.

There are three conditions that can occur for a single neuron once an input vector
**p** is presented and the network's response
**a** is calculated:

**CASE 1.** If an input vector is presented and the
output of the neuron is correct (**a** = **t** and **e** = **t** – **a** = 0), then the
weight vector **w** is not altered.

**CASE 2.** If the neuron output is 0 and should have
been 1 (**a** = 0 and **t** = 1, and **e** = **t** – **a** = 1), the input
vector **p** is added to the weight vector **w**. This makes the weight vector point closer to the input
vector, increasing the chance that the input vector will be classified as a 1 in the
future.

**CASE 3.** If the neuron output is 1 and should have
been 0 (**a** = 1 and **t** = 0, and **e** = **t** – **a** = –1), the input
vector **p** is subtracted from the weight vector
**w**. This makes the weight vector point farther
away from the input vector, increasing the chance that the input vector will be
classified as a 0 in the future.

The perceptron learning rule can be written more succinctly in terms of the error
**e** = **t** –
**a** and the change to be made to the weight
vector Δ**w**:

**CASE 1.** If **e** =
0, then make a change Δ**w** equal to 0.

**CASE 2.** If **e** =
1, then make a change Δ**w** equal to **p**^{T}.

**CASE 3.** If **e** =
–1, then make a change Δ**w** equal to –**p**^{T}.

All three cases can then be written with a single expression:

$$\Delta w=(t-\alpha ){p}^{T}=e{p}^{T}$$

You can get the expression for changes in a neuron's bias by noting that the bias is simply a weight that always has an input of 1:

$$\Delta b=(t-\alpha )(1)=e$$

For the case of a layer of neurons you have

$$\Delta W=(t-a){(p)}^{T}=e{(p)}^{T}$$

and

$$\Delta b=(t-a)=e$$

The perceptron learning rule can be summarized as follows:

$${W}^{new}={W}^{old}+e{p}^{T}$$

and

$${b}^{new}={b}^{old}+e$$

where **e** = **t** –
**a**.

Now try a simple example. Start with a single neuron having an input vector with just two elements.

net = perceptron; net = configure(net,[0;0],0);

To simplify matters, set the bias equal to 0 and the weights to 1 and -0.8:

net.b{1} = [0]; w = [1 -0.8]; net.IW{1,1} = w;

The input target pair is given by

p = [1; 2]; t = [1];

You can compute the output and error with

a = net(p) a = 0 e = t-a e = 1

and use the function `learnp`

to find the change in the
weights.

dw = learnp(w,p,[],[],[],[],e,[],[],[],[],[]) dw = 1 2

The new weights, then, are obtained as

w = w + dw w = 2.0000 1.2000

The process of finding new weights (and biases) can be repeated until there are no errors. Recall that the perceptron learning rule is guaranteed to converge in a finite number of steps for all problems that can be solved by a perceptron. These include all classification problems that are linearly separable. The objects to be classified in such cases can be separated by a single line.

You might want to try the example `nnd4pr`

. It allows you to pick
new input vectors and apply the learning rule to classify them.

If `sim`

and `learnp`

are used repeatedly to present inputs to a perceptron, and to
change the perceptron weights and biases according to the error, the perceptron will
eventually find weight and bias values that solve the problem, given that the
perceptron *can* solve it. Each traversal through all the
training input and target vectors is called a *pass*.

The function `train`

carries out such a loop of
calculation. In each pass the function `train`

proceeds through the specified sequence of inputs, calculating
the output, error, and network adjustment for each input vector in the sequence as
the inputs are presented.

Note that `train`

does not guarantee that the
resulting network does its job. You must check the new values of **W** and **b** by computing the
network output for each input vector to see if all targets are reached. If a network
does not perform successfully you can train it further by calling `train`

again with the new weights and biases for more training
passes, or you can analyze the problem to see if it is a suitable problem for the
perceptron. Problems that cannot be solved by the perceptron network are discussed
in Limitations and Cautions.

To illustrate the training procedure, work through a simple problem. Consider a one-neuron perceptron with a single vector input having two elements:

This network, and the problem you are about to consider, are simple enough that you can follow through what is done with hand calculations if you want. The problem discussed below follows that found in [HDB1996].

Suppose you have the following classification problem and would like to solve it with a single vector input, two-element perceptron network.

$$\left\{{p}_{1}=\left[\begin{array}{l}2\\ 2\end{array}\right],{t}_{1}=0\}\left\{{p}_{2}=\left[\begin{array}{c}1\\ -2\end{array}\right],{t}_{2}=1\right\}\left\{{p}_{3}=\left[\begin{array}{c}-2\\ 2\end{array}\right],{t}_{3}=0\right\}\{{p}_{4}=\left[\begin{array}{c}-1\\ 1\end{array}\right],{t}_{4}=1\right\}$$

Use the initial weights and bias. Denote the variables at each step of this
calculation by using a number in parentheses after the variable. Thus, above, the
initial values are **W**(0) and
*b*(0).

$$\begin{array}{cc}W(0)=\left[\begin{array}{cc}0& 0\end{array}\right]& b(0)=0\end{array}$$

Start by calculating the perceptron’s output *a* for the first
input vector **p**_{1}, using the
initial weights and bias.

$$\begin{array}{c}\alpha =hardlim(W(0){p}_{1}+b(0))\\ =hardlim\left(\left[\begin{array}{cc}0& 0\end{array}\right]\left[\begin{array}{l}2\\ 2\end{array}\right]+0\right)=hardlim(0)=1\end{array}$$

The output *a* does not equal the target value
*t*_{1}, so use the perceptron rule to find
the incremental changes to the weights and biases based on the error.

$$\begin{array}{l}e={t}_{1}-\alpha =0-1=-1\\ \Delta W=e{p}_{1}^{T}=(-1)\left[\begin{array}{cc}2& 2\end{array}\right]=\left[\begin{array}{cc}-2& -2\end{array}\right]\\ \Delta b=e=(-1)=-1\end{array}$$

You can calculate the new weights and bias using the perceptron update rules.

$$\begin{array}{l}{W}^{new}={W}^{old}+e{p}^{T}=\left[\begin{array}{cc}0& 0\end{array}\right]+\left[\begin{array}{cc}-2& -2\end{array}\right]=\left[\begin{array}{cc}-2& -2\end{array}\right]=W(1)\\ {b}^{new}={b}^{old}+e=0+(-1)=-1=b(1)\end{array}$$

Now present the next input vector, **p**_{2}. The output is calculated below.

$$\begin{array}{c}\alpha =hardlim(W(1){p}_{2}+b(1))\\ =hardlim\left(\left[\begin{array}{cc}-2& -2\end{array}\right]\left[\begin{array}{r}1\\ -2\end{array}\right]-1\right)=hardlim(1)=1\end{array}$$

On this occasion, the target is 1, so the error is zero. Thus there are no changes
in weights or bias, so **W**(2) = **W**(1) = [−2 −2] and *b*(2) = *b*(1)
= −1.

You can continue in this fashion, presenting **p**_{3} next, calculating an output and the error,
and making changes in the weights and bias, etc. After making one pass through all
of the four inputs, you get the values **W**(4) = [−3
−1] and *b*(4) = 0. To determine whether a satisfactory solution is
obtained, make one pass through all input vectors to see if they all produce the
desired target values. This is not true for the fourth input, but the algorithm does
converge on the sixth presentation of an input. The final values are

**W**(6) = [−2 −3] and
*b*(6) = 1.

This concludes the hand calculation. Now, how can you do this using the `train`

function?

The following code defines a perceptron.

net = perceptron;

Consider the application of a single input

p = [2; 2];

having the target

t = [0];

Set `epochs`

to 1, so that `train`

goes through the input vectors (only one here) just one
time.

net.trainParam.epochs = 1; net = train(net,p,t);

The new weights and bias are

w = net.iw{1,1}, b = net.b{1} w = -2 -2 b = -1

Thus, the initial weights and bias are 0, and after training on only the first vector, they have the values [−2 −2] and −1, just as you hand calculated.

Now apply the second input vector **p**_{2}. The output is 1, as it will be until the
weights and bias are changed, but now the target is 1, the error will be 0, and the
change will be zero. You could proceed in this way, starting from the previous
result and applying a new input vector time after time. But you can do this job
automatically with `train`

.

Apply `train`

for one epoch, a single pass
through the sequence of all four input vectors. Start with the network
definition.

net = perceptron; net.trainParam.epochs = 1;

The input vectors and targets are

p = [[2;2] [1;-2] [-2;2] [-1;1]] t = [0 1 0 1]

Now train the network with

net = train(net,p,t);

The new weights and bias are

w = net.iw{1,1}, b = net.b{1} w = -3 -1 b = 0

This is the same result as you got previously by hand.

Finally, simulate the trained network for each of the inputs.

a = net(p) a = 0 0 1 1

The outputs do not yet equal the targets, so you need to train the network for more than one pass. Try more epochs. This run gives a mean absolute error performance of 0 after two epochs:

net.trainParam.epochs = 1000; net = train(net,p,t);

Thus, the network was trained by the time the inputs were presented on the third epoch. (As you know from hand calculation, the network converges on the presentation of the sixth input vector. This occurs in the middle of the second epoch, but it takes the third epoch to detect the network convergence.) The final weights and bias are

w = net.iw{1,1}, b = net.b{1} w = -2 -3 b = 1

The simulated output and errors for the various inputs are

a = net(p) a = 0 1 0 1 error = a-t error = 0 0 0 0

You confirm that the training procedure is successful. The network converges and produces the correct target outputs for the four input vectors.

The default training function for networks created with
`perceptron`

is `trainc`

. (You can find this by
executing `net.trainFcn`

.) This training function applies the
perceptron learning rule in its pure form, in that individual input vectors are
applied individually, in sequence, and corrections to the weights and bias are made
after each presentation of an input vector. Thus, perceptron training with `train`

will converge in a finite number of steps unless the problem
presented cannot be solved with a simple perceptron.

The function `train`

can be used in various ways by
other networks as well. Type `help train`

to read more about this
basic function.

You might want to try various example programs. For instance, Classification with a Two-Input Perceptron illustrates classification and training of a simple perceptron.

Perceptron networks should be trained with `adapt`

, which presents the input
vectors to the network one at a time and makes corrections to the network based on
the results of each presentation. Use of `adapt`

in this way guarantees that
any linearly separable problem is solved in a finite number of training
presentations.

As noted in the previous pages, perceptrons can also be trained with the function
`train`

. Commonly when `train`

is used for perceptrons, it presents the inputs to the network
in batches, and makes corrections to the network based on the sum of all the
individual corrections. Unfortunately, there is no proof that such a training
algorithm converges for perceptrons. On that account the use of `train`

for perceptrons is not recommended.

Perceptron networks have several limitations. First, the output values of a perceptron can take on only one of two values (0 or 1) because of the hard-limit transfer function. Second, perceptrons can only classify linearly separable sets of vectors. If a straight line or a plane can be drawn to separate the input vectors into their correct categories, the input vectors are linearly separable. If the vectors are not linearly separable, learning will never reach a point where all vectors are classified properly. However, it has been proven that if the vectors are linearly separable, perceptrons trained adaptively will always find a solution in finite time. You might want to try Linearly Non-separable Vectors. It shows the difficulty of trying to classify input vectors that are not linearly separable.

It is only fair, however, to point out that networks with more than one perceptron can be used to solve more difficult problems. For instance, suppose that you have a set of four vectors that you would like to classify into distinct groups, and that two lines can be drawn to separate them. A two-neuron network can be found such that its two decision boundaries classify the inputs into four categories. For additional discussion about perceptrons and to examine more complex perceptron problems, see [HDB1996].

Long training times can be caused by the presence of an *outlier* input vector whose length is much
larger or smaller than the other input vectors. Applying the perceptron learning
rule involves adding and subtracting input vectors from the current weights and
biases in response to error. Thus, an input vector with large elements can lead
to changes in the weights and biases that take a long time for a much smaller
input vector to overcome. You might want to try Outlier Input Vectors to see how an
outlier affects the training.

By changing the perceptron learning rule slightly, you can make training times insensitive to extremely large or small outlier input vectors.

Here is the original rule for updating weights:

$$\Delta w=(t-\alpha ){p}^{T}=e{p}^{T}$$

As shown above, the larger an input vector **p**,
the larger its effect on the weight vector **w**.
Thus, if an input vector is much larger than other input vectors, the smaller
input vectors must be presented many times to have an effect.

The solution is to normalize the rule so that the effect of each input vector on the weights is of the same magnitude:

$$\Delta w=(t-\alpha )\frac{{p}^{T}}{\Vert p\Vert}=e\frac{{p}^{T}}{\Vert p\Vert}$$

The normalized perceptron rule is implemented with the function
`learnpn`

, which is called exactly
like `learnp`

. The normalized perceptron
rule function `learnpn`

takes slightly more time
to execute, but reduces the number of epochs considerably if there are outlier
input vectors. You might try Normalized Perceptron Rule to see how this
normalized training rule works.