How to create Custom Regression Output Layer with multiple inputs for training sequence-to-sequence LSTM model?

For the neural network architecture I am using for my problem, I would like to define a Regression Output Layer with a custom loss function. For this, I would need the regression layer to have two inputs, one from a fully connected layer and other from the sequenceInput layer, however I am not able to achieve that. How do I get around this?
Following is the definition of the custom layer:
classdef customLossLayerMultiInput < nnet.layer.RegressionLayer & nnet.layer.Acceleratable
% Custom regression layer with mean-absolute-error loss and additional properties.
properties
node_properties
numFeature
end
methods
function layer = customLossLayerMultiInput(name, node_properties, numFeature)
% Constructor
layer.Name = name;
layer.Description = 'Physics-Informed loss function for LSTM training';
layer.node_properties = node_properties;
layer.numFeature = numFeature;
end
function loss = forwardLoss(layer, Y, T, varargin)
% Calculate the forward loss
% Reshape predictions and targets
Y = reshape(Y, [], 1);
T = reshape(T, [], 1);
X1 = varargin{1};
X2 = varargin{2};
% Sequence input data
sequence_input_data = reshape(X1, [], layer.numFeature);
% Calculate mean residue
mean_residue = PI_BEM_Residue(Y, T, sequence_input_data, layer.node_properties);
% Calculate RMSE loss
rmse_loss = rmse(Y, T);
% Total loss
loss = mean_residue + rmse_loss;
end
end
end
And this is the network architecture
layers = [
sequenceInputLayer(numFeatures, 'Name', 'inputLayer') % Define the sequence input layer and name it
lstmLayer(num_hidden_units, 'OutputMode', 'sequence', 'Name', 'lstmLayer') % Define the LSTM layer and name it
fullyConnectedLayer(1, 'Name', 'fullyConnectedLayer') % Define the fully connected layer and name it
dropoutLayer(x.dropout_rate, 'Name', 'dropoutLayer') % Define the dropout layer and name it
customLossLayerMultiInput(LayerName, node_properties,numFeatures)
];
% Create a layer graph
lgraph = layerGraph(layers);
lgraph = connectLayers(lgraph,"inputLayer",strcat(LayerName,'\in2'));
For this setup, I am getting an error
Error using nnet.cnn.LayerGraph>iValidateLayerName
Layer 'RegressionLayer_Node2\in2' does not exist.

 Accepted Answer

Unfortunately it's not possible to define a custom multi-input loss layer.
The possible options are:
  1. If Y, X1 and X2 have compatible sizes you can concatenate them before customLossLayerMultiInput and pass these in as a single input to the loss.
  2. Use dlnetwork and a custom training loop - in this case you can write a much more flexible loss function rather than a custom loss layer, however you need to write a training loop following an example like this.

7 Comments

Hello Ben,
Thanks for your response. Luckily I could modify my data such that the sizes of Y, X1 and X2 will be compatible. Following is the code I am using to generate this network
LayerName = strcat('RegressionLayer_Node',num2str(node_properties.nodenumber));
layers = [
sequenceInputLayer(numFeatures, 'Name', 'inputLayer') % Define the sequence input layer and name it
lstmLayer(num_hidden_units, 'OutputMode', 'sequence', 'Name', 'lstmLayer') % Define the LSTM layer and name it
fullyConnectedLayer(1, 'Name', 'fullyConnectedLayer') % Define the fully connected layer and name it
dropoutLayer(x.dropout_rate, 'Name', 'dropoutLayer') % Define the dropout layer and name it
concatenationLayer(1,2,'Name','concat')
customLossLayer(LayerName, node_properties,numFeatures,X_train);
];
% Create a layer graph
lgraph = layerGraph(layers);
lgraph = connectLayers(lgraph,'inputLayer','concat/in2');
figure, plot(lgraph)
This is how the network architecture looks like
And I am training this network using following code:
% Define the training options
options = trainingOptions('adam', ...
'MaxEpochs', 30, ...
'MiniBatchSize', 1, ...
'InitialLearnRate',x.learning_rate,...
'SequenceLength', 'longest', ...
'Shuffle', 'once', ...
'L2Regularization',0.001,...
'ValidationData',{X_val,Y_val}, ...
'ValidationFrequency',10,...
'Verbose',false,...
'ExecutionEnvironment','auto');
% Train the LSTM network
net = trainNetwork(X_train, Y_train, lgraph, options);
Here the variable X_train and Y_train are a cell arrays of dimension Nx1. Each cell in X_train has a size 4x4000 and the corresponding cell in Y_train has a size 1x4000. I am trying to concatenate these arrays using the concatenation layer but during the training I am getting an error:
Error using trainNetwork
Invalid training data. For cell array input, responses must be an N-by-1 cell array of sequences, where N is the number of sequences. The spatial and
channel dimensions of the sequences must be the same as the output size of the last layer (5).
Would you please help me figure out how to tackle this?
It would appear that the custom regression layer is outputting data with 5 channels rather than 1 as in your Y_train.
Note that you can't concatenate Y_train anywhere except in the custom loss layer - the X_train are passed as the network inputs, the Y_train correspond to the T (target) input of the custom loss layer.
Hey Ben,
As you can see in the network architecture, the input to the custom regression layer (RegressionLayer_Node2) is concatenation of the output of dropout layer and sequenece input layer, therfore the dimension of input to the custom regression layer is five (4 from input layer + 1 from drop out layer), where my target has dimension equal to one. I would need 4 input parameters to define the custom loss function, following is the definition of my custom regression layer:
classdef physicsInformedLossLayer < nnet.layer.RegressionLayer & nnet.layer.Acceleratable
% Custom regression layer with mean-absolute-error loss and additional properties.
properties
node_properties
end
methods
function layer = physicsInformedLossLayer(name, node_properties)
% Constructor
layer.Name = name;
layer.Description = 'Physics-Informed loss function for NN training';
layer.node_properties = node_properties;
end
function loss = forwardLoss(layer, Y, T)
% Calculate the forward loss
Vx = Y(1);
Vy = Y(2);
Azimuth = Y(4);
Y_op = Y(end);
nodenumber = layer.node_properties.nodenumber;
theta = layer.node_properties.theta;
Airfoils = layer.node_properties.Airfoils;
TipRad = layer.node_properties.TipRad;
HubRad = layer.node_properties.HubRad;
BlSpn = layer.node_properties.BlSpn;
Solid = layer.node_properties.Solid;
chi0 = layer.node_properties.chi0;
phi = Y(ii,1)+theta;
[Residue] = CallStateResidual(phi, theta, Vx(ii,1), Vy(ii,1), nodenumber, Airfoils, TipRad, HubRad, BlSpn, Solid, Azimuth, chi0);
loss = (Y_op-T)^2 + Residue;
end
end
end
The output of the regression layer still has the dimension equal to one. How do I make this work?
Ah, sorry there appears to be an expectation that Y and T have the same number of features, so the software is inferring that because the Y input to loss has 5 features, so should T.
I agree that seems confusing. You could "pad" T with redundant features so that it has 5 features and just take T(1,:) in the loss layer. Or you could convert to dlnetwork and use custom training as this is more flexible, allowing you to define a loss function without any layer restrictions.
Sorry for the confusion.
Hello Ben, thanks for all the suggestions so far! I agree that dlnetwork provides more flexibility but the training process is much slower and the convergence rate is not great. Following is the code I am using for defining custom training loop
layers = [featureInputLayer(size(X_train,2),'Name','fetureInput')
fullyConnectedLayer(opt_param.numneuron1,'Name','fc1')
fullyConnectedLayer(opt_param.numneuron2,'Name','fc2')
fullyConnectedLayer(1,'Name','fc3')
dropoutLayer(opt_param.dropout_rate)
];
% Create a layer graph
net = dlnetwork(layers);
% Define the training options
miniBatchSize = 30000;
numEpochs = 50;
numObservations = numel(Y_train);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);
numIterations = numEpochs * numIterationsPerEpoch;
% Adam parameters
averageGrad = [];
averageSqGrad = [];
%Initialize training progress monitor
monitor = trainingProgressMonitor(Metrics="Loss",Info="Epoch",XLabel="Iteration");
% Network training
iteration = 0;
epoch = 0;
while epoch < numEpochs && ~monitor.Stop
epoch = epoch + 1;
% Shuffle data.
idx = randperm(numel(Y_train));
X_epoch = X_train(idx,:);
Y_epoch = Y_train(idx);
i = 0;
while i < numIterationsPerEpoch && ~monitor.Stop
i = i + 1;
iteration = iteration + 1;
% Read mini-batch of data and convert the labels to dummy
% variables.
idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
X = X_epoch(idx,:);
T = Y_epoch(idx);
% Convert mini-batch of data to a dlarray.
X = dlarray(single(X));
T = dlarray(single(T));
node_properties_array = dlarray(node_properties_array);
% Evaluate the model loss and gradients using dlfeval and the
% modelLoss function.
[loss,gradients] = dlfeval(@modelLoss,net,X,T,node_properties_array);
% Update the network parameters using the Adam optimizer.
[net,averageGrad,averageSqGrad] = adamupdate(net,gradients,averageGrad,averageSqGrad,iteration);
% Update the training progress monitor.
recordMetrics(monitor,iteration,Loss=loss);
updateInfo(monitor,Epoch=epoch + " of " + numEpochs);
monitor.Progress = 100 * iteration/numIterations;
end
end
Y_val_pred = predict(net, dlarray(X_val',"CB"));
val_loss = mean(mean((Y_val' - Y_val_pred).^2));
And the modelLoss function is:
function [loss,gradients] = modelLoss(net,X,T,node_properties_array)
X = dlarray(X',"CB");
% Forward data through network.
[Y,~] = forward(net,X);
X = extractdata(X);
Y = extractdata(Y);
node_properties_array = extractdata(node_properties_array);
% % Initialize node properties
nodenumber = node_properties_array(1);
theta = node_properties_array(2);
TipRad = node_properties_array(3);
HubRad = node_properties_array(4);
BlSpn = node_properties_array(5);
Solid = node_properties_array(6);
chi0 = node_properties_array(7);
load Airfoils.mat
batch_size = numel(Y);
Residue = zeros(1,batch_size);
Residue = dlarray(Residue);
for ii = 1:batch_size
Vx = X(1,ii);
Vy = X(2,ii);
Azimuth = X(4,ii);
phi = Y(ii)+theta;
input_array = [phi theta Vx Vy nodenumber TipRad HubRad BlSpn Solid Azimuth chi0];
[Residue(1,ii)] = CallStateResidual_ANN_mex(double(input_array), Airfoils);
end
% data_loss = mean((Y-T').^2);
physics_loss = mean(abs(Residue))/100;
% loss = sqrt(data_loss+physics_loss)
% Train the model with physics_loss alone
loss = sqrt(data_loss+physics_loss)
% Calculate gradients of loss with respect to learnable parameters.
gradients = dlgradient(loss,net.Learnables);
end
However, whenver I try to run this function using physics_loss alone, I get the following error:
Error using dlarray/dlgradient
Value to differentiate is not traced. It must be a traced real dlarray scalar. Use dlgradient inside a function called by dlfeval to trace the
variables.
I think this could be happening due to the fact that the matlab function I am using to calculate the Residue term uses double as input as Matlab code generation does not support dlarrays but if I don't use the mex version, the code is really slow in execution. I would really appreciate if you could help me find a solution to this.
It's true that for dlgradient(loss,learnables) to work that loss must be computed using only dlarray methods on learnables, otherwise we can't compute the derivatives automatically.
Sice Residue depends on Y which is the output of forward(net,X) you can't use a MEX function on Y and still get a traced output that we can compute gradients of. The extractdata calls break the tracing that is used to compute automatic derivatives.
At a glance it looks like CallStateResidual_ANN_mex is not vectorized over the batch size, so that's the first thing I'd suggest, though it's hard to know if that's plausible since I can't see the implementation of that function.
After that note that dlaccelerate can help optimize modelLoss if it's being called multiple times, so long as the only inputs to modelLoss that vary frequently are dlarray-s.
It's worth noting that the computation in modelLoss would have to happen for trainNetwork too if this was allowed as a custom loss layer, so it's not really avoidable. I would expect dlaccelerate to make up for most of the difference between speed of trainNetwork and custom training loops. As for convergence, this should be the same if your custom training loop implements all the same things as your trainingOptions does in trainNetwork - note that trainingOptions includes non-zero L2Regularization by default.
Hello Ben,
Thanks for the suggestions, I didn't know about the dlaccelerate functionality. Vectorising the CallStateResidual_ANN would be a difficult task but I will give it a try. Hopefully this will improve the computation time.
Thanks again!

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!