Why are the results of forward and predict very different in deep learning?
62 views (last 30 days)
Show older comments
When I use the "dlnetwork" type deep neural network model to make predictions, the results of the two functions are very different, except that using the predict function will freeze the batchNormalizationLayer and dropout layers.While forward does not freeze the parameters, he is the forward transfer function used in the training phase.


From the two pictures above, there are orders of magnitude difference in the output of the previous 10 results. Where does the problem appear?
0 Comments
Accepted Answer
Daniel Vieira
on 5 Aug 2021
Edited: Daniel Vieira
on 5 Aug 2021
I ran into this exact problem, and I think I found a solution, I'll discover it when my model finishes training...
As others said before, the problem occurs because batchNorms behave differently in forward() and predict(). But there is still a problem here: if you trained your model (forward), it should have converged to a solution that works well in inference (predict), but it doesn't. Something is wrong in the training too.
What is wrong is that batchNorms don't update parameters the same way as other layers through (adam/rmsprop/sgdm)update functions. They update through the State property of the dlnetwork object. Consider the code:
[gradients,loss] = dlfeval(@modelGradients,dlnet,dlX,Ylabel);
[dlnet,otherOutputs]=rmspropupdate(dlnet,gradients,otherInputs);
function [gradients,loss] = modelGradients(dlnet,dlX,Ylabel)
Y=forward(dlnet,dlX);
loss=myLoss(Y,Ylabel);
gradients=dlgradient(loss,dlnet.Learnables);
end
The code above is wrong if you have batchNorms, it won't update them. The batchNorms are updated through the State property returnet from forward and assigned to dlnet:
[gradients,state,loss] = dlfeval(@modelGradients,dlnet,dlX,Ylabel);
dlnet.State=state; % THIS!!!
[dlnet,otherOutputs]=rmspropupdate(dlnet,gradients,otherInputs);
function [gradients,state,loss] = modelGradients(dlnet,dlX,Ylabel)
[Y,state]=forward(dlnet,dlX); % THIS!!!
loss=myLoss(Y,Ylabel);
gradients=dlgradient(loss,dlnet.Learnables);
end
Now that dlnet has a State property updated at every forward() call, the batchNorms are updated and your model should converge to a solution that works for predict().
I would also like caling MathWorks attention that this detail is only present in documentation in ONE example of GAN networks (in spite of the omnipresence of batchNorm layers in deep learning models) and is never mentioned explicitly.
1 Comment
Amanjit Dulai
on 27 Oct 2021
To see how to perform this state update for a classification network, there is this example:
More Answers (3)
vaibhav mishra
on 30 Jun 2020
Hi there,
In my opinion you are using BatchNorm in training and not in testing, so how can you expect to get the same results from both. You need to use batchnorm in testing also with the same parameters as training.
Luc VIGNAUD
on 29 Jun 2021
Thank you for raising this question. I did observe this issue playing with GANs and the difference comes indeed from the batchNorm. I ended using InstanceNorm instead but the question remains and should be answered by the matlab team ...
0 Comments
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!