Neural Network Toolbox - Backpropagation stopping criteria

Question

Haider Ali on 21 Mar 2015

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/184317-neural-network-toolbox-backpropagation-stopping-criteria

Commented: Greg Heath on 25 Apr 2015

CI.pdf

I am using Neural Network Toolbox to classify a data of 12 alarms into 9 classes with one hidden layer containing 8 neurons. I wanted to know:

What equations does training algorithm traingdm use to update the weights and bias? Are these the same as given below (etta is learning rate i.e. 0.7 and alpha is momentum coefficient i.e. 0.9):

where delta_j for output layer is:

while for hidden layer it is:

These equations are taken directly from the paper attached.

2. What does the stopping criteria net.trainParam.goal mean? Which field to update if I want my stopping criteria to be mean square error equal to 0.0001? Do I need to update net.trainParam.min_grad to 0.0001 for this?

3. How are the weights being updated in traingdm? Is it batch updation (like after every epoch) or is it updation after every input pattern of every epoch?

4. I have 41 training input patterns. How many of those are use for training process and how many for recall process. What if I want all 41 of them to be used only for training process?

5. I have tried the following code but the outputs are not being classified accurately.

    clear all; close all; clc;
p = [
0 0 0 0 0 0 0 0 0 0 0; ...   %c1
0 1 0 0 0 0 0 0 0 0 0; ...
0 1 1 0 0 0 0 0 0 0 0; ...
0 1 0 1 0 0 0 0 0 0 0; ...
0 1 0 0 0 0 0 0 1 0 0; ...
0 1 1 1 0 0 0 0 0 0 0; ...
0 1 0 1 1 0 0 0 1 0 0; ...
0 1 0 1 0 0 0 0 1 0 0; ...
0 1 1 0 0 0 0 0 1 0 0; ...
0 1 0 1 1 1 0 0 0 0 0; ...
0 1 0 1 1 0 1 0 0 0 0; ...
0 1 1 1 0 0 0 0 1 0 0; ...
1 0 0 0 0 0 0 0 0 0 0; ...    %c2
0 0 0 0 0 0 0 0 0 0 0; ...
0 0 1 0 0 0 0 0 0 0 0; ...
0 0 0 1 0 0 0 0 0 0 0; ...
0 0 0 0 0 0 0 0 1 0 0; ...
0 0 1 1 0 0 0 0 0 0 0; ...
0 0 0 1 1 0 0 0 1 0 0; ...
0 0 0 1 0 0 0 0 1 0 0; ...
0 0 1 0 0 0 0 0 1 0 0; ...
0 0 0 1 1 1 0 0 0 0 0; ...
0 0 0 1 1 0 1 0 0 0 0; ...
0 0 1 1 0 0 0 0 1 0 0; ...
0 0 1 0 0 0 0 0 0 0 0; ...    %c3
0 0 0 1 0 0 0 0 0 0 0; ...    %c4 or c5
0 0 0 1 1 0 0 0 0 0 0; ...
0 0 0 1 1 1 0 0 0 0 0; ...
0 0 0 1 1 0 1 0 0 0 0; ...
0 0 0 0 1 0 0 0 0 0 0; ...    %c6
0 0 0 0 1 1 0 0 0 0 0; ...
0 0 0 0 1 0 1 0 0 0 0; ...
0 0 0 0 0 0 1 0 0 0 0; ...    %c7
0 0 0 0 0 0 0 1 0 0 0; ...    %c8
0 0 0 0 0 0 0 0 0 1 0; ...
0 0 0 0 0 0 0 1 1 0 0; ...
0 0 0 0 0 0 0 0 0 1 1; ...
0 0 0 0 0 0 0 1 0 1 0; ...
0 0 0 0 0 0 0 0 0 0 1; ...    %c9
0 1 0 0 0 0 0 0 0 0 0; ...    %c1 or c2
0 0 0 0 0 0 0 0 1 0 0; ...    %c1 or c2 or c3
    ]';
t = [
0 0 0 0 0 0 0 0; ...
0 0 0 0 0 0 0 0; ...
0 0 0 0 0 0 0 0; ...
0 0 0 0 0 0 0 0; ...
0 0 0 0 0 0 0 0; ...
0 0 0 0 0 0 0 0; ...
0 0 0 0 0 0 0 0; ...
0 0 0 0 0 0 0 0; ...
0 0 0 0 0 0 0 0; ...
0 0 0 0 0 0 0 0;...
0 0 0 0 0 0 0 0; ...
0 0 0 0 0 0 0 0; ...
1 0 0 0 0 0 0 0; ...  %c2
1 0 0 0 0 0 0 0; ...
1 0 0 0 0 0 0 0; ...
1 0 0 0 0 0 0 0; ...
1 0 0 0 0 0 0 0; ...
1 0 0 0 0 0 0 0; ...
1 0 0 0 0 0 0 0; ...
1 0 0 0 0 0 0 0; ...
1 0 0 0 0 0 0 0; ...
1 0 0 0 0 0 0 0; ...
1 0 0 0 0 0 0 0; ...
1 0 0 0 0 0 0 0; ...
0 1 0 0 0 0 0 0; ...  %c3
0 0 1 1 0 0 0 0; ...  %c4 or c5
0 0 1 1 0 0 0 0; ...
0 0 1 1 0 0 0 0; ...
0 0 1 1 0 0 0 0; ...
0 0 0 0 1 0 0 0; ...  %c6
0 0 0 0 1 0 0 0; ...
0 0 0 0 1 0 0 0; ...
0 0 0 0 0 1 0 0; ...  %c7
0 0 0 0 0 0 1 0; ...  %c8
0 0 0 0 0 0 1 0; ...
0 0 0 0 0 0 1 0; ...
0 0 0 0 0 0 1 0; ...
0 0 0 0 0 0 1 0; ...
0 0 0 0 0 0 0 1; ...  %c9
1 0 0 0 0 0 0 0; ...  %c1 or c2
1 1 0 0 0 0 0 0; ...  %c1 or c2 or c3
    ]';
net = feedforwardnet(8,'traingdm'); %8 hidden layers and training algorithm
net = configure(net,p,t);
net.layers{2}.transferFcn = 'logsig';   %sigmoid function in output layer
net.layers{1}.transferFcn = 'logsig';   %sigmiod fucntion in hidden layer
net.performFcn = 'mse';
net = init(net);
net.trainParam.epochs = 100000;     %no. of epochs are not my concern hence a large number
net.trainParam.lr = 0.7;            %obtained from the paper attached
net.trainParam.mc = 0.9;            %obtained from the paper attached
net.trainParam.max_fail = 100000;   
net.trainParam.min_grad = 0.00015;  %is this stopping criteria same as mse?
net = train(net,p,t);
view(net);

Let me know if something else needs to be specified. Regards.

1 Comment
Show -1 older commentsHide -1 older comments

Greg Heath on 25 Apr 2015

Open in MATLAB Online

% Target columns should sum to 1

% If targets are mutually exclusive there is only one "1"

% init(net) unecessary because of configure

NO MITIGATION FOR OVERTRAINING AN OVERFIT NET

max_epoch is HUGE
msegoal not specified ==> default of 0
no validation stopping 
no regularization (trainbr)

Hope this helps.

Greg

Sign in to comment.

Sign in to answer this question.

Answer 1

Greg Heath on 21 Mar 2015

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/184317-neural-network-toolbox-backpropagation-stopping-criteria#answer_172183

Open in MATLAB Online

If you are going to use MATLAB, I suggest using as many defaults as possible.

 1. Use PATTERNNET for classification
 2. To see the default settings, type into the command line WITHOUT AN ENDING SEMICOLON
 net = patternnet   % default H = 10
 3. If a vector can belong to m of c classes,
    a. The c-dimensional unit target vector should contain 
        i. m positive components that sum to 1
       ii. c-m components of value 0
 4. Typically, the only things that need to be varied are
    a. H, the number of hidden nodes
    b. The initial weights
 5. The best way to do this is 
  a. Initialize the RNG
  b. Use an outer loop over number of hidden nodes
  c. Use an inner loop over random weight initializations
d. For example
Ntrials = 10
rng('default')
j=0
for h = Hmin:dH:Hmax
   j=j+1
   ...
   for i = 1:Ntrials
   ...
   end
end
 6. See the NEWSGROUP and ANSWERS for examples. Search with 
 greg patternnet Ntrials

Hope this helps

Thank you for formally accepting my answer

Greg

8 Comments
Show 6 older commentsHide 6 older comments

Greg Heath on 21 Mar 2015

Open in MATLAB Online

What I was trying to say was your approach to classification doesn't seem to be promising. My recommendation was to use a more standard approach with a default heavy version of patternnet as I have demonstrated in numerous examples. However, first check out the documentation and trivial examples in

 help patternnet
 doc  patternnet

Nonetheless, I did run your code. However, I obtained gruesome results. When I looked at your data a little more closely, I see a major part of the problem (which will exist regardless of whether a classification function like patternnet or a regression function like fitnet or feedforwardnet is used):

Your class representations are severely unbalanced

For example

 sum(t')      = [ 14  14  2  4  4  3  1  5  1 ]
 sum(sum(t')) = 48
    The easiest way to deal with this is to add replicas of the 7 smaller classes so that the number of samples per class is approximately equal. My posted examples of the BIOID data classification are good examples. In fact, adding a SMALL amount of 9-dimensional random noise (jitter) might even help.

In the past 30+ years I have never dealt with a classification target matrix with columns containing more than 1 nonzero entry. I think the best way to deal with those is also to make replicas instead of having fractional targets.

Then, the classifier has a more clear definition of the 9 classes.

I will answer some of your questions in the next comment.

Greg Heath on 21 Mar 2015

Open in MATLAB Online

% Neural Network Toolbox - Backpropagation stopping criteria % % Asked by Haider Ali about 3 hours ago % % I am using Neural Network Toolbox to classify a data of 12 alarms % into 9 classes with one hidden layer containing 8 neurons. I wanted % to know:

How did you determine H = 8?

% 1.What equations does training algorithm traingdm use to update % the weights and bias? Are these the same as given below (etta is % learning rate i.e. 0.7 and alpha is momentum coefficient i.e. 0.9): % where delta_j for output layer is: % while for hidden layer it is: *These equations are taken directly from the paper attached.

 help traingdm
 doc traingdm
 type traingdm

% 2. What does the stopping criteria net.trainParam.goal mean?

Training stops if the error function is <= goal.

% Which field to update if I want my stopping criteria to be mean square % error equal to 0.0001?

net.trainParam.goal = 0.0001

%Do I need to update net.trainParam.min_grad to 0.0001 for this?

No. However, I use

 MSEgoal                         = 0.01*mean(var(t',1)) % or 0.005
 net.trainParam.goal         = MSEgoal
net.trainParam.min_grad = MSEgoal/100

% 3. How are the weights being updated in traingdm? Is it batch % updation (like after every epoch) or is it updation after every % input pattern of every epoch?

The default is batch. However you can use adaptation if you wish See the documentation

 help/doc train
 help/doc adapt

% 4. I have 41 training input patterns. How many of those are use % for training process and how many for recall process.

 total     = design + test
 design = training +validation
 The default ratios are  Ntrn/Nval/Ntst = 0.7/0.15/0.15

% What if % I want all 41 of them to be used only for training process?

 Typically, not a great idea. Search the NN literature for overfitting, 
 overtraining and generalization
 net = patternet;               % For classification
 net.trainFcn = 'trainbr';      % If Ntrn = N

Haider Ali on 30 Mar 2015

Open in MATLAB Online

Hi Greg,

There is only one hidden layer containing 8 neurons. The author has not mentioned the train/validate/test ratio.

I am now using the Iris Data Set to train my NN using Back Propagation (just for my own understanding and testing). The code is below:

      clear all;
close all;
clc;
p = [
5.1,3.5,1.4,0.2;    %iris data set
4.9,3.0,1.4,0.2;
4.7,3.2,1.3,0.2;
4.6,3.1,1.5,0.2;
5.0,3.6,1.4,0.2;
5.4,3.9,1.7,0.4;
4.6,3.4,1.4,0.3;
5.0,3.4,1.5,0.2;
4.4,2.9,1.4,0.2;
4.9,3.1,1.5,0.1;
5.4,3.7,1.5,0.2;
4.8,3.4,1.6,0.2;
4.8,3.0,1.4,0.1;
4.3,3.0,1.1,0.1;
5.8,4.0,1.2,0.2;
5.7,4.4,1.5,0.4;
5.4,3.9,1.3,0.4;
5.1,3.5,1.4,0.3;
5.7,3.8,1.7,0.3;
5.1,3.8,1.5,0.3;
5.4,3.4,1.7,0.2;
5.1,3.7,1.5,0.4;
4.6,3.6,1.0,0.2;
5.1,3.3,1.7,0.5;
4.8,3.4,1.9,0.2;
5.0,3.0,1.6,0.2;
5.0,3.4,1.6,0.4;
5.2,3.5,1.5,0.2;
5.2,3.4,1.4,0.2;
4.7,3.2,1.6,0.2;
4.8,3.1,1.6,0.2;
5.4,3.4,1.5,0.4;
5.2,4.1,1.5,0.1;
5.5,4.2,1.4,0.2;
4.9,3.1,1.5,0.1;
5.0,3.2,1.2,0.2;
5.5,3.5,1.3,0.2;
4.9,3.1,1.5,0.1;
4.4,3.0,1.3,0.2;
5.1,3.4,1.5,0.2;
5.0,3.5,1.3,0.3;
4.5,2.3,1.3,0.3;
4.4,3.2,1.3,0.2;
5.0,3.5,1.6,0.6;
5.1,3.8,1.9,0.4;
4.8,3.0,1.4,0.3;
5.1,3.8,1.6,0.2;
4.6,3.2,1.4,0.2;
5.3,3.7,1.5,0.2;
5.0,3.3,1.4,0.2;
7.0,3.2,4.7,1.4;
6.4,3.2,4.5,1.5;
6.9,3.1,4.9,1.5;
5.5,2.3,4.0,1.3;
6.5,2.8,4.6,1.5;
5.7,2.8,4.5,1.3;
6.3,3.3,4.7,1.6;
4.9,2.4,3.3,1.0;
6.6,2.9,4.6,1.3;
5.2,2.7,3.9,1.4;
5.0,2.0,3.5,1.0;
5.9,3.0,4.2,1.5;
6.0,2.2,4.0,1.0;
6.1,2.9,4.7,1.4;
5.6,2.9,3.6,1.3;
6.7,3.1,4.4,1.4;
5.6,3.0,4.5,1.5;
5.8,2.7,4.1,1.0;
6.2,2.2,4.5,1.5;
5.6,2.5,3.9,1.1;
5.9,3.2,4.8,1.8;
6.1,2.8,4.0,1.3;
6.3,2.5,4.9,1.5;
6.1,2.8,4.7,1.2;
6.4,2.9,4.3,1.3;
6.6,3.0,4.4,1.4;
6.8,2.8,4.8,1.4;
6.7,3.0,5.0,1.7;
6.0,2.9,4.5,1.5;
5.7,2.6,3.5,1.0;
5.5,2.4,3.8,1.1;
5.5,2.4,3.7,1.0;
5.8,2.7,3.9,1.2;
6.0,2.7,5.1,1.6;
5.4,3.0,4.5,1.5;
6.0,3.4,4.5,1.6;
6.7,3.1,4.7,1.5;
6.3,2.3,4.4,1.3;
5.6,3.0,4.1,1.3;
5.5,2.5,4.0,1.3;
5.5,2.6,4.4,1.2;
6.1,3.0,4.6,1.4;
5.8,2.6,4.0,1.2;
5.0,2.3,3.3,1.0;
5.6,2.7,4.2,1.3;
5.7,3.0,4.2,1.2;
5.7,2.9,4.2,1.3;
6.2,2.9,4.3,1.3;
5.1,2.5,3.0,1.1;
5.7,2.8,4.1,1.3;
6.3,3.3,6.0,2.5;
5.8,2.7,5.1,1.9;
7.1,3.0,5.9,2.1;
6.3,2.9,5.6,1.8;
6.5,3.0,5.8,2.2;
7.6,3.0,6.6,2.1;
4.9,2.5,4.5,1.7;
7.3,2.9,6.3,1.8;
6.7,2.5,5.8,1.8;
7.2,3.6,6.1,2.5;
6.5,3.2,5.1,2.0;
6.4,2.7,5.3,1.9;
6.8,3.0,5.5,2.1;
5.7,2.5,5.0,2.0;
5.8,2.8,5.1,2.4;
6.4,3.2,5.3,2.3;
6.5,3.0,5.5,1.8;
7.7,3.8,6.7,2.2;
7.7,2.6,6.9,2.3;
6.0,2.2,5.0,1.5;
6.9,3.2,5.7,2.3;
5.6,2.8,4.9,2.0;
7.7,2.8,6.7,2.0;
6.3,2.7,4.9,1.8;
6.7,3.3,5.7,2.1;
7.2,3.2,6.0,1.8;
6.2,2.8,4.8,1.8;
6.1,3.0,4.9,1.8;
6.4,2.8,5.6,2.1;
7.2,3.0,5.8,1.6;
7.4,2.8,6.1,1.9;
7.9,3.8,6.4,2.0;
6.4,2.8,5.6,2.2;
6.3,2.8,5.1,1.5;
6.1,2.6,5.6,1.4;
7.7,3.0,6.1,2.3;
6.3,3.4,5.6,2.4;
6.4,3.1,5.5,1.8;
6.0,3.0,4.8,1.8;
6.9,3.1,5.4,2.1;
6.7,3.1,5.6,2.4;
6.9,3.1,5.1,2.3;
5.8,2.7,5.1,1.9;
6.8,3.2,5.9,2.3;
6.7,3.3,5.7,2.5;
6.7,3.0,5.2,2.3;
6.3,2.5,5.0,1.9;
6.5,3.0,5.2,2.0;
6.2,3.4,5.4,2.3;
5.9,3.0,5.1,1.8;
]';
t = [
0;    %assign 0 to output neuron for Iris-setosa
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0;
0.5;      %assign 0.5 to output neuron for Iris-versicolor
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
0.5;
1;          %assign 1 to output neuron for Iris-virginica
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
1;
]';
net = feedforwardnet(3,'traingd'); %3 hidden layers and training algorithm
net = configure(net,p,t);
net.layers{2}.transferFcn = 'logsig';   %sigmoid function in output layer
net.layers{1}.transferFcn = 'logsig';   %sigmiod fucntion in hidden layer
net.performFcn = 'mse';
net = init(net);
net.trainParam.epochs = 10000;     
net.trainParam.lr = 0.7;            %learning rate
net.trainParam.goal = 0.01;         %mse
net = train(net,p,t);
view(net);

The problem is that I am not getting the desired output for the first class (for which the output should be close to zero). When I input a vector from the first class to the trained net, the output is close to 0.5 (but it should be close to zero).

This is the output for the first vector of the first class:

output = net([5.1,3.5,1.4,0.2]')

output =

0.5003

This output should be close to zero (because I have assigned 0 to first class), but it is coming out to be 0.5. This is the case for all the inputs of first class. For the second and third class, the outputs are fine i.e. close to 0.5 for class 2 and close to 1.0 for class 3.

Can you please run this code and tell me what I am doing wrong?

(I think it might be issue of the bias input because all the outputs for class 1 are being offset by 0.5.)

Regards.

Greg Heath on 25 Apr 2015

%GEH1: LOUSY TARGET CODING

%GEH2: traingd instead of traingdm

% GEH3: Logsig output INVALID for default mapmaxmin [-1 1 ] scaling

Hope this helps

Greg

Sign in to comment.

Neural Network Toolbox - Backpropagation stopping criteria

1 Comment
Show -1 older commentsHide -1 older comments

Accepted Answer

8 Comments
Show 6 older commentsHide 6 older comments

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

Neural Network Toolbox - Backpropagation stopping criteria

1 Comment Show -1 older commentsHide -1 older comments

Accepted Answer

8 Comments Show 6 older commentsHide 6 older comments

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

8 Comments
Show 6 older commentsHide 6 older comments