Gradient descent with momentum backpropagation

`net.trainFcn = 'traingdm'`

[net,tr] = train(net,...)

`traingdm`

is a network training function that
updates weight and bias values according to gradient descent with
momentum.

`net.trainFcn = 'traingdm'`

sets the network `trainFcn`

property.

`[net,tr] = train(net,...)`

trains the network
with `traingdm`

.

Training occurs according to `traingdm`

training
parameters, shown here with their default values:

`net.trainParam.epochs` | `1000` | Maximum number of epochs to train |

`net.trainParam.goal` | `0` | Performance goal |

`net.trainParam.lr` | `0.01` | Learning rate |

`net.trainParam.max_fail` | `6` | Maximum validation failures |

`net.trainParam.mc` | `0.9` | Momentum constant |

`net.trainParam.min_grad` | `1e-5` | Minimum performance gradient |

`net.trainParam.show` | `25` | Epochs between showing progress |

`net.trainParam.showCommandLine` | `false` | Generate command-line output |

`net.trainParam.showWindow` | `true` | Show training GUI |

`net.trainParam.time` | `inf` | Maximum time to train in seconds |

You can create a standard network that uses `traingdm`

with `feedforwardnet`

or `cascadeforwardnet`

.
To prepare a custom network to be trained with `traingdm`

,

Set

`net.trainFcn`

to`'traingdm'`

. This sets`net.trainParam`

to`traingdm`

's default parameters.Set

`net.trainParam`

properties to desired values.

In either case, calling `train`

with the resulting
network trains the network with `traingdm`

.

See `help feedforwardnet`

and ```
help
cascadeforwardnet
```

for examples.

In addition to `traingd`

, there are three other
variations of gradient descent.

Gradient descent with momentum, implemented by `traingdm`

,
allows a network to respond not only to the local gradient, but also
to recent trends in the error surface. Acting like a lowpass filter,
momentum allows the network to ignore small features in the error
surface. Without momentum a network can get stuck in a shallow local
minimum. With momentum a network can slide through such a minimum.
See page 12–9 of [HDB96] for a discussion of momentum.

Gradient descent with momentum depends on two training parameters.
The parameter `lr`

indicates the learning rate, similar
to the simple gradient descent. The parameter `mc`

is
the momentum constant that defines the amount of momentum. `mc`

is
set between 0 (no momentum) and values close to 1 (lots of momentum).
A momentum constant of 1 results in a network that is completely insensitive
to the local gradient and, therefore, does not learn properly.)

p = [-1 -1 2 2; 0 5 0 5]; t = [-1 -1 1 1]; net = feedforwardnet(3,'traingdm'); net.trainParam.lr = 0.05; net.trainParam.mc = 0.9; net = train(net,p,t); y = net(p)

Try the *Neural Network Design* demonstration `nnd12mo`

[HDB96] for
an illustration of the performance of the batch momentum algorithm.

Was this topic helpful?