Neural network training fails when target values are small. Mapminmax issue?

When I try to train a network with very small targets the training stops at epoch 0 (i.e., does not begin at all) because the gradient is already too small. I understand that a very small target could imply a very small gradient but the mapminmax function is active and it should map the target in [-1,1] avoiding this kind of problems. So what's going on?
Here's some code:
First I define a really small sine wave:
in = [0:0.1:10];
out = sin(in)/1e10;
then I create and configure a network
net = fitnet([15]);
net = configure(net,in,out);
The mapminmax function seems to be active and properly configured:
net.outputs{1,2}.processSettings{1,2}
ans =
name: 'mapminmax'
xrows: 1
xmax: 9.9957e-11
xmin: -9.9992e-11
xrange: 1.9995e-10
yrows: 1
ymax: 1
ymin: -1
yrange: 2
no_change: 0
gain: 1.0003e+10
xoffset: -9.9992e-11
but the training fails (it stops at epoch 0):
[net,tr] = train(net,in,out);
tr.stop
ans =
Minimum gradient reached.
tr.num_epochs
ans =
0
The learning completely failed, this is the output of the net:
But if I manually use mapminmax everything works well
net = configure(net,in,mapminmax(out,-1,1));
[net,tr] = train(net,in,mapminmax(out,-1,1));
tr.stop
ans =
Minimum gradient reached.
tr.num_epochs
ans =
377
And the network actually learned the sine function:
Any ideas?

 Accepted Answer

You have to change the defaults for BOTH the MSE goal AND the Minimum gradient. They are on the scale of the UNNORMALIZED data. For simple problems I tend to use the average BIASED target variance estimate to get
MSEgoal = mean(var(target',1))/100
MinGrad = MSEgoal/100
On more serious problems I consider BOTH the UNBIASED mean target variance estimate for O-dimensional targets AND the loss of degree of freedom because the same data is used to BOTH estimate Nw unknown weights AND to estimate the performance:
MSEgoal =
0.01*max(0,Ndof)* mean(var(target',0)) / Ntrneq
where
Ntrneq = Ntrn*O % No of training equations
Ndof = Ntrneq - Nw % No of degrees of freedom
For details search both the NEWSGROUP and ANSWERS using
greg MSEgoal MinGrad
Hope this helps
Thank you for formally accepting my answer
Greg

4 Comments

Hi Greg, thank you so much for your reply.
I clearly understand your point. I already tried tuning min_grad with some good result but what I'm trying to understand is if this is the major bug it seems or not.
When you say "mse goal and gradient goal are on the scale of the unnormalized data" do you mean that mapminmax parameters and settings affect only the simulation of the net and not the training? It would look to me like a really major issue of neural network toolbox.
For what is reported in the documentation the implementation of mapminmax should fully automate the pre/post processing stages and make the "magnitude" of inputs and targets completely irrelevant. According to documentation, since the network always maps inputs and targets in a proper interval ([-1,1] if hyperbolic tangent is used as activation function,) it should not be able to dinstinguish, for example these 2 problems:
1.
inputs = [0:0.1:10];
targets = sin(inputs);
2.
inputs = [0:0.1:10];
targets = sin(inputs)/1e6;
Am I right or I'm missing something?
I made 3 experiments. If necessary I can post the code.
1. I trainined on the same data 2 nets with equal initialization, no early stopping. The only difference of these nets is that the first has mapminmax active and configured in both input and output layers, the second has no pre/post process functioncs. The training resulted in different weights and biases. This should prove that mapminmax affects the training in some way.
2. Simulating a network with mapminmax (parameters retrieved from the net itself) by manually computing the correct linear combination of processed inputs with weights and biases results in the exact output produced by calling "sim" function. Then mapminmax affects the simulation of the net in the correct (and expected) way.
3. I defined a parametric problem like this
if true
eps = 1;
in = [0:01:10];
out = sin(in)*eps;
end
Then I trained 2 networks, the first one (Net1) had mapminmax active and it's been trained in the regular way
net = configure(net,in,out);
net = train(net,in,out);
The second one (Net2) had no active processing functions and I manually pre-processed inputs and targets in both training and simulation.
Then I computed for both the quantity
C = (sqrt(MSE)/mean(targets))*100
that represents the percentage of the error of the net compared to the mean of targets.
I repeated the experiment gradually lowering eps. It turned out that
a. for eps = 1, C(Net1) and C(Net2) are really close, substantially the same (I did a lot of runs); b. decreasing eps C(Net2) remained costant while C(Net1) skyrocketed. For instance
  • eps = 1 => C(Net1) = C(Net2) = 0.01%
  • eps = 0.001 => C(Net1) = 0.01% and C(Net2) = 36%
This should prove that the way mapminmax affects the training is not the correct way. Manually preprocessing inputs and targets makes the net insensitive to the magnitude of number, as it should be.
Now, established that tuning training parameters such as gradient and mse goals is useful for sure, is this mapminmax issue the major bug it looks to me or I'm missing something?
You are right.
THIS IS A BUG.
I alerted MATLAB before and whatever fixed value they had for MSEgoal was changed to 0. I don't recall if they changed MinGrad or not.
Regardless, the use of my own MSEgoal and MinGrad was prompted because of the dissatisfaction with the MATLAB defaults.
By the way, if you are using NARNET or NARXNET the values should probably be scaled with 0.005 or 0.001 instead of 0.01 because closing the loop requires that openloop performance be exceptionally good.
Greg
Thank you very much for your help. Btw, in 2014a default msegoal and mingrad are 0 and 1e-7. I'll try your formula for these parameters. Thank you again!

Sign in to comment.

More Answers (0)

Categories

Find more on Deep Learning Toolbox in Help Center and File Exchange

Asked:

on 28 Aug 2016

Commented:

bah
on 24 Oct 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!