Gradient descent with momentum backpropagation
net.trainFcn = 'traingdm'
[net,tr] = train(net,...)
traingdm is a network training function that
updates weight and bias values according to gradient descent with
net.trainFcn = 'traingdm' sets the network
[net,tr] = train(net,...) trains the network
Training occurs according to
parameters, shown here with their default values:
Maximum number of epochs to train
Maximum validation failures
Minimum performance gradient
Epochs between showing progress
Generate command-line output
Show training GUI
Maximum time to train in seconds
You can create a standard network that uses
To prepare a custom network to be trained with
to desired values.
In either case, calling
train with the resulting
network trains the network with
help feedforwardnet and
cascadeforwardnet for examples.
In addition to
traingd, there are three other
variations of gradient descent.
Gradient descent with momentum, implemented by
allows a network to respond not only to the local gradient, but also
to recent trends in the error surface. Acting like a lowpass filter,
momentum allows the network to ignore small features in the error
surface. Without momentum a network can get stuck in a shallow local
minimum. With momentum a network can slide through such a minimum.
See page 12–9 of [HDB96] for a discussion of momentum.
Gradient descent with momentum depends on two training parameters.
lr indicates the learning rate, similar
to the simple gradient descent. The parameter
the momentum constant that defines the amount of momentum.
set between 0 (no momentum) and values close to 1 (lots of momentum).
A momentum constant of 1 results in a network that is completely insensitive
to the local gradient and, therefore, does not learn properly.)
p = [-1 -1 2 2; 0 5 0 5]; t = [-1 -1 1 1]; net = feedforwardnet(3,'traingdm'); net.trainParam.lr = 0.05; net.trainParam.mc = 0.9; net = train(net,p,t); y = net(p)
Try the Neural Network Design demonstration
nnd12mo [HDB96] for
an illustration of the performance of the batch momentum algorithm.
traingdm can train any network as long as
its weight, net input, and transfer functions have derivative functions.
Backpropagation is used to calculate derivatives of performance
respect to the weight and bias variables
variable is adjusted according to gradient descent with momentum,
dX = mc*dXprev + lr*(1-mc)*dperf/dX
dXprev is the previous change to the
weight or bias.
Training stops when any of these conditions occurs:
The maximum number of
The maximum amount of
time is exceeded.
Performance is minimized to the
The performance gradient falls below
Validation performance has increased more than
since the last time it decreased (when using validation).