How to create Custom Regression Output Layer with multiple inputs for training sequence-to-sequence LSTM model?

Question

Shubham Baisthakur on 16 Jun 2023

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/1984164-how-to-create-custom-regression-output-layer-with-multiple-inputs-for-training-sequence-to-sequence

Commented: Shubham Baisthakur on 23 Jun 2023

For the neural network architecture I am using for my problem, I would like to define a Regression Output Layer with a custom loss function. For this, I would need the regression layer to have two inputs, one from a fully connected layer and other from the sequenceInput layer, however I am not able to achieve that. How do I get around this?

Following is the definition of the custom layer:

classdef customLossLayerMultiInput < nnet.layer.RegressionLayer & nnet.layer.Acceleratable
    % Custom regression layer with mean-absolute-error loss and additional properties.
    
    properties
        node_properties
        numFeature
    end
    
    methods
        function layer = customLossLayerMultiInput(name, node_properties, numFeature)
            % Constructor
            layer.Name = name;
            layer.Description = 'Physics-Informed loss function for LSTM training';
            layer.node_properties = node_properties;
            layer.numFeature = numFeature;
        end
        
        function loss = forwardLoss(layer, Y, T, varargin)
            % Calculate the forward loss
            
            % Reshape predictions and targets
            Y = reshape(Y, [], 1);
            T = reshape(T, [], 1);
            X1 = varargin{1};
            X2 = varargin{2}; 
            % Sequence input data
            sequence_input_data = reshape(X1, [], layer.numFeature);
            % Calculate mean residue
            mean_residue = PI_BEM_Residue(Y, T, sequence_input_data, layer.node_properties);
            % Calculate RMSE loss
            rmse_loss = rmse(Y, T);
            % Total loss
            loss = mean_residue + rmse_loss;
        end
    end
end

And this is the network architecture

layers = [
    sequenceInputLayer(numFeatures, 'Name', 'inputLayer') % Define the sequence input layer and name it
    lstmLayer(num_hidden_units, 'OutputMode', 'sequence', 'Name', 'lstmLayer') % Define the LSTM layer and name it
    fullyConnectedLayer(1, 'Name', 'fullyConnectedLayer') % Define the fully connected layer and name it
    dropoutLayer(x.dropout_rate, 'Name', 'dropoutLayer') % Define the dropout layer and name it
    customLossLayerMultiInput(LayerName, node_properties,numFeatures)
    ];
% Create a layer graph
lgraph = layerGraph(layers);
lgraph = connectLayers(lgraph,"inputLayer",strcat(LayerName,'\in2'));

For this setup, I am getting an error

Error using nnet.cnn.LayerGraph>iValidateLayerName
Layer 'RegressionLayer_Node2\in2' does not exist.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Ben on 20 Jun 2023

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/1984164-how-to-create-custom-regression-output-layer-with-multiple-inputs-for-training-sequence-to-sequence#answer_1259314

Unfortunately it's not possible to define a custom multi-input loss layer.

The possible options are:

If Y, X1 and X2 have compatible sizes you can concatenate them before customLossLayerMultiInput and pass these in as a single input to the loss.
Use dlnetwork and a custom training loop - in this case you can write a much more flexible loss function rather than a custom loss layer, however you need to write a training loop following an example like this.

7 Comments
Show 5 older commentsHide 5 older comments

Shubham Baisthakur on 20 Jun 2023

Open in MATLAB Online

Hello Ben,

Thanks for your response. Luckily I could modify my data such that the sizes of Y, X1 and X2 will be compatible. Following is the code I am using to generate this network

    LayerName = strcat('RegressionLayer_Node',num2str(node_properties.nodenumber));
    
layers = [
    sequenceInputLayer(numFeatures, 'Name', 'inputLayer') % Define the sequence input layer and name it
    lstmLayer(num_hidden_units, 'OutputMode', 'sequence', 'Name', 'lstmLayer') % Define the LSTM layer and name it
    fullyConnectedLayer(1, 'Name', 'fullyConnectedLayer') % Define the fully connected layer and name it
    dropoutLayer(x.dropout_rate, 'Name', 'dropoutLayer') % Define the dropout layer and name it
    concatenationLayer(1,2,'Name','concat')
    customLossLayer(LayerName, node_properties,numFeatures,X_train);
    ];
% Create a layer graph
lgraph = layerGraph(layers);
lgraph = connectLayers(lgraph,'inputLayer','concat/in2');
figure, plot(lgraph)

This is how the network architecture looks like

And I am training this network using following code:

    % Define the training options
    options = trainingOptions('adam', ...
        'MaxEpochs', 30, ...
        'MiniBatchSize', 1, ...
        'InitialLearnRate',x.learning_rate,...
        'SequenceLength', 'longest', ...
        'Shuffle', 'once', ...
        'L2Regularization',0.001,...
        'ValidationData',{X_val,Y_val}, ...
        'ValidationFrequency',10,...
        'Verbose',false,...
        'ExecutionEnvironment','auto');
    % Train the LSTM network
    net = trainNetwork(X_train, Y_train, lgraph, options);
    

Here the variable X_train and Y_train are a cell arrays of dimension Nx1. Each cell in X_train has a size 4x4000 and the corresponding cell in Y_train has a size 1x4000. I am trying to concatenate these arrays using the concatenation layer but during the training I am getting an error:

Error using trainNetwork
Invalid training data. For cell array input, responses must be an N-by-1 cell array of sequences, where N is the number of sequences. The spatial and
channel dimensions of the sequences must be the same as the output size of the last layer (5).

Would you please help me figure out how to tackle this?

Shubham Baisthakur on 21 Jun 2023

Edited: Shubham Baisthakur on 21 Jun 2023

Open in MATLAB Online

Hey Ben,

As you can see in the network architecture, the input to the custom regression layer (RegressionLayer_Node2) is concatenation of the output of dropout layer and sequenece input layer, therfore the dimension of input to the custom regression layer is five (4 from input layer + 1 from drop out layer), where my target has dimension equal to one. I would need 4 input parameters to define the custom loss function, following is the definition of my custom regression layer:

classdef physicsInformedLossLayer < nnet.layer.RegressionLayer & nnet.layer.Acceleratable
    % Custom regression layer with mean-absolute-error loss and additional properties.
    
    properties
        node_properties
    end
    
    methods
        function layer = physicsInformedLossLayer(name, node_properties)
            % Constructor
            layer.Name = name;
            layer.Description = 'Physics-Informed loss function for NN training';
            layer.node_properties = node_properties;
        end
        
        function loss = forwardLoss(layer, Y, T)
            % Calculate the forward loss
            
            Vx = Y(1);
            Vy = Y(2);
            Azimuth = Y(4);
            Y_op = Y(end);
                                    
            nodenumber = layer.node_properties.nodenumber;
            theta = layer.node_properties.theta;
            Airfoils = layer.node_properties.Airfoils;
            TipRad = layer.node_properties.TipRad;
            HubRad = layer.node_properties.HubRad;
            BlSpn = layer.node_properties.BlSpn;
            Solid = layer.node_properties.Solid;
            chi0 = layer.node_properties.chi0;
            phi = Y(ii,1)+theta;
            
            [Residue] =  CallStateResidual(phi, theta, Vx(ii,1), Vy(ii,1), nodenumber, Airfoils, TipRad, HubRad, BlSpn, Solid, Azimuth, chi0);
            loss = (Y_op-T)^2 + Residue;
        end
    end
end

The output of the regression layer still has the dimension equal to one. How do I make this work?

Shubham Baisthakur on 22 Jun 2023

Open in MATLAB Online

Hello Ben, thanks for all the suggestions so far! I agree that dlnetwork provides more flexibility but the training process is much slower and the convergence rate is not great. Following is the code I am using for defining custom training loop

layers = [featureInputLayer(size(X_train,2),'Name','fetureInput')
    fullyConnectedLayer(opt_param.numneuron1,'Name','fc1')
    fullyConnectedLayer(opt_param.numneuron2,'Name','fc2')
    fullyConnectedLayer(1,'Name','fc3')
    dropoutLayer(opt_param.dropout_rate)
    ];
    % Create a layer graph
    net = dlnetwork(layers);
    % Define the training options
    miniBatchSize = 30000;
    numEpochs = 50;
    numObservations = numel(Y_train);
    numIterationsPerEpoch = floor(numObservations./miniBatchSize);
    numIterations = numEpochs * numIterationsPerEpoch;
    % Adam parameters
    averageGrad = [];
    averageSqGrad = [];
    %Initialize training progress monitor
    monitor = trainingProgressMonitor(Metrics="Loss",Info="Epoch",XLabel="Iteration");
    % Network training
    iteration = 0;
    epoch = 0;
    while epoch < numEpochs && ~monitor.Stop
        epoch = epoch + 1;
    
        % Shuffle data.
        idx = randperm(numel(Y_train));
        X_epoch = X_train(idx,:);
        Y_epoch = Y_train(idx);
    
        i = 0;
        while i < numIterationsPerEpoch && ~monitor.Stop
            i = i + 1;
            iteration = iteration + 1;
    
            % Read mini-batch of data and convert the labels to dummy
            % variables.
            idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
            X = X_epoch(idx,:);
            T = Y_epoch(idx);
    
            % Convert mini-batch of data to a dlarray.
            X = dlarray(single(X));
            T = dlarray(single(T));
            node_properties_array = dlarray(node_properties_array);
    
            % Evaluate the model loss and gradients using dlfeval and the
            % modelLoss function.
            [loss,gradients] = dlfeval(@modelLoss,net,X,T,node_properties_array);
    
            % Update the network parameters using the Adam optimizer.
            [net,averageGrad,averageSqGrad] = adamupdate(net,gradients,averageGrad,averageSqGrad,iteration);
            % Update the training progress monitor.
            recordMetrics(monitor,iteration,Loss=loss);
            updateInfo(monitor,Epoch=epoch + " of " + numEpochs);
            monitor.Progress = 100 * iteration/numIterations;
        end
    end
    Y_val_pred = predict(net, dlarray(X_val',"CB"));
    val_loss = mean(mean((Y_val' - Y_val_pred).^2));
    

And the modelLoss function is:

function [loss,gradients] = modelLoss(net,X,T,node_properties_array)
X = dlarray(X',"CB");
% Forward data through network.
[Y,~] = forward(net,X);
X = extractdata(X);
Y = extractdata(Y);
node_properties_array = extractdata(node_properties_array);
% % Initialize node properties
nodenumber = node_properties_array(1);
theta = node_properties_array(2);
TipRad = node_properties_array(3);
HubRad = node_properties_array(4);
BlSpn = node_properties_array(5);
Solid = node_properties_array(6);
chi0 = node_properties_array(7);
load Airfoils.mat
batch_size = numel(Y);
Residue = zeros(1,batch_size);
Residue = dlarray(Residue);
for ii = 1:batch_size
    Vx = X(1,ii);
    Vy = X(2,ii);
    Azimuth = X(4,ii);
    phi = Y(ii)+theta;
    input_array = [phi theta Vx Vy nodenumber TipRad HubRad BlSpn Solid Azimuth chi0];
    [Residue(1,ii)] =  CallStateResidual_ANN_mex(double(input_array), Airfoils);
end
% data_loss = mean((Y-T').^2);
physics_loss = mean(abs(Residue))/100;
% loss = sqrt(data_loss+physics_loss)
% Train the model with physics_loss alone
loss = sqrt(data_loss+physics_loss)
% Calculate gradients of loss with respect to learnable parameters.
gradients = dlgradient(loss,net.Learnables);
end

However, whenver I try to run this function using physics_loss alone, I get the following error:

Error using dlarray/dlgradient
Value to differentiate is not traced. It must be a traced real dlarray scalar. Use dlgradient inside a function called by dlfeval to trace the
variables.

I think this could be happening due to the fact that the matlab function I am using to calculate the Residue term uses double as input as Matlab code generation does not support dlarrays but if I don't use the mex version, the code is really slow in execution. I would really appreciate if you could help me find a solution to this.

Ben on 23 Jun 2023

It's true that for dlgradient(loss,learnables) to work that loss must be computed using only dlarray methods on learnables, otherwise we can't compute the derivatives automatically.

Sice Residue depends on Y which is the output of forward(net,X) you can't use a MEX function on Y and still get a traced output that we can compute gradients of. The extractdata calls break the tracing that is used to compute automatic derivatives.

At a glance it looks like CallStateResidual_ANN_mex is not vectorized over the batch size, so that's the first thing I'd suggest, though it's hard to know if that's plausible since I can't see the implementation of that function.

After that note that dlaccelerate can help optimize modelLoss if it's being called multiple times, so long as the only inputs to modelLoss that vary frequently are dlarray-s.

It's worth noting that the computation in modelLoss would have to happen for trainNetwork too if this was allowed as a custom loss layer, so it's not really avoidable. I would expect dlaccelerate to make up for most of the difference between speed of trainNetwork and custom training loops. As for convergence, this should be the same if your custom training loop implements all the same things as your trainingOptions does in trainNetwork - note that trainingOptions includes non-zero L2Regularization by default.

Shubham Baisthakur on 23 Jun 2023

Hello Ben,

Thanks for the suggestions, I didn't know about the dlaccelerate functionality. Vectorising the CallStateResidual_ANN would be a difficult task but I will give it a try. Hopefully this will improve the computation time.

Thanks again!

Sign in to comment.

How to create Custom Regression Output Layer with multiple inputs for training sequence-to-sequence LSTM model?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

7 Comments
Show 5 older commentsHide 5 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

How to create Custom Regression Output Layer with multiple inputs for training sequence-to-sequence LSTM model?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

7 Comments Show 5 older commentsHide 5 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

7 Comments
Show 5 older commentsHide 5 older comments