Projection of LSTM layer vs GRU layer

9 views (last 30 days)
Silvia
Silvia on 28 May 2024
Commented: Silvia on 10 Jun 2024
I am training two RNNs, one with a LSTM layer and the other one with a GRU layer. The two architectures are the following:
numFeatures = 1;
numHiddenUnits = 32;
layersLSTM = [
sequenceInputLayer(numFeatures)
lstmLayer(numHiddenUnits, OutputMode="sequence")
fullyConnectedLayer(numFeatures)
];
layersGRU = [
sequenceInputLayer(numFeatures)
gruLayer(numHiddenUnits, OutputMode="sequence")
fullyConnectedLayer(numFeatures)
];
Using the GRU architecture and training the projected model, the Validation RMSE and Loss do not follow the Training RMSE and Loss as shown in the image below:
It's the first time that this happens. For the LSTM NN I've never had this problem (both for the architecture with LSTM layer and the one with LSTM projected layer), and also training the GRU NN model without projection I didn't have this problem. The validation could follow the metrics properly. What could this problem be due to?
I have also a second question:
Following the two examples in Matlab I set the parameters of outputProjectorSize and inputProjectorSize to:
  • 75% of the number of Hidden Units and 25% of the Input size respectively for LSTM
  • 25% of the number of Hidden Units and 75% of the Input size respectively for GRU
So, for the GRU it's the opposite. Is there a reason behind this choise?
Thank you in advance!

Answers (1)

Maksym Tymchenko
Maksym Tymchenko on 3 Jun 2024
I am glad to see that you are using our new projection features.
I'll start by answering the second question.
From what I see, both examples are using the exact same definition for OutputProjectorSize and InputProjectorSize in the section "Compare Network Projection Sizes":
  • An output projector size of 25% of the number of hidden units.
  • An input projector size of 75% of the input size.
These are reasonable parameter sizes to choose because they result in the lstmProjectedLayer having fewer learnable parameters compared to an lstmLayer with the same number of hidden units. Note that it is possible to choose values that will result in a projected layer being larger than the original layer. To avoid this, use the function compressNetworkUsingProjection which will determine these parameters sizes automatically based on the desired amount of compression specified.
Alternatively, if you want to create the projected layers from scratch, follow the Tips in the description of the the OutputProjectorSize and InputProjectorSize parameters. These say that, to ensure that the projected layer requires fewer learnable parameters than the corresponding non-projected layer:
  1. For an lstmProjectedLayer: set the OutputProjectorSize property to a value less than 4*NumHiddenUnits/5, and set the InputProjectorSize property to a value less than 4*NumHiddenUnits*inputSize/(4*NumHiddenUnits+inputSize)
  2. For a gruProjectedLayer: set the OutputProjectorSize property to a value less than 3*NumHiddenUnits/4, and set the InputProjectorSize property to a value less than 3*NumHiddenUnits*inputSize/(3*NumHiddenUnits+inputSize)
These formulas can be derived by expressing the total number of learnable parameters as a function of the number of hidden units and the input size. For more information, see the algorithms section of the pages lstmProjectedLayer and gruProjectedLayer.
Regarding your first question, I would need the full reproduction steps, including the script and dataset used, in order to investigate what the issue is. Please feel free to share these as an attachment to this post. Or alternatively, you can open a technical support request with the reproduction steps.
  1 Comment
Silvia
Silvia on 10 Jun 2024
Thank you for the detailed explanations and the interesting insight into the compressNetworkUsingProjection function!
Unfortunately, as far as the codes and datasets are concerned, I cannot share anything for reasons of data privacy.
But thank you again for your help!
Silvia

Sign in to comment.

Categories

Find more on Deep Learning Toolbox in Help Center and File Exchange

Products


Release

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!