Clear Filters
Clear Filters

How to extract partial derivatives of some specific layer in the back-propagation of a deep learning model?

3 views (last 30 days)
Say I have a deep learning model, and after training I call this model net.
When I input some images into net, I want to have the partial derivatives , where h are the outputs of the relu1 layer (i.e. ) and θ are the parameters of all trainable weights of the layers before relu1.
You can see that h (i.e. the output of relu1) will have a size of . I write the size of the training weights before relu1 as , where would be the set of all trainable parameters of the layers before relu1. Therefore should have the size of .
How can I get in the code? Many thanks!
My current code
%% Load Data
digitDatasetPath = fullfile(matlabroot,'toolbox','nnet','nndemos', ...
imds = imageDatastore(digitDatasetPath, ...
'IncludeSubfolders',true, ...
numTrainFiles = 50;
[imdsTrain,imdsValidation] = splitEachLabel(imds,numTrainFiles,'randomize');
%% Define Network Architecture
inputSize = [28 28 1];
numClasses = 10;
layers = [
%% Train Network
options = trainingOptions('sgdm', ...
'MaxEpochs',4, ...
'ValidationData',imdsValidation, ...
'ValidationFrequency',30, ...
'Verbose',false, ...
net = trainNetwork(imdsTrain,layers,options);

Answers (1)

Dinesh Yadav
Dinesh Yadav on 26 Nov 2019
Kindly go through the following link and examples in it.
After the reluLayer command you can use dlgradient to compute partial derivatives on the outputs of relu layer.
Hope it helps.
Dinesh Yadav
Dinesh Yadav on 27 Nov 2019
I dont think there is a way to do it with dlgradient without using loops . If you want to do it without using loops you will have to write your own custom gradient function.
SC on 27 Nov 2019
I think something like the jacobian() would help.
jacobian() works for the simple rosenbrock() case, but I don't think it works for the deep learning objects...
I will use a for-loop with dlgradient() then (but I need to loop over 10000+ times since the output size of my desired layer is over 10000, so it will waste many computed re-usable values in the back-propagation and become much more time comsuming than the theorectical computational time). Thank you for your help.

Sign in to comment.




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!