Does LSTM training reset after resuming learning?
    5 views (last 30 days)
  
       Show older comments
    
I have a database called processed_data, which contains cells structured like this:
0.980999999999767	0.945912306864893	1
1.46300000000338	0.926617136227153	1
0.511999999995169	0.868790509137634	2
1.00600000000122	0.978074194186882	1
0.995999999999185	0.884817478795566	2
1.12400000000343	0.740093883803231	2
1.35399999999936	0.418494628137842	2
0.653999999994994	0.399199457500103	2
1.00600000000122	0.438938088213894	2
0.999000000003434	0.566427539286267	2
The first column represents the seconds that have passed since the previous row, the second column is the value at that given time (normalized), and the third column contains the categorical values 1 and 2. The first two columns are the predictors, and the third column is the target. 
Each cell represents one day and I have around 215 days worth of data, each with a varying number of observations. The goal is to create an LSTM algorithm that, based on the predictors, can predict whether the value will increase (2) or decrease (1) in the future. During training, I'm keeping each day separate by stopping the learning when the last batch for a given day is reached, then loading the data for the next day and resuming the training.
The problem is that when training resumes after loading the next day's data, it's as if the network is completely reset and starts learning from scratch. It repeatedly produces the same accuracy results (excluding the first iteration) with only slight changes in loss values. The accuracy in the remaining iterations remains unchanged, always producing the same values, as if the network is not learning at all. Here is an example output for day 1:
1. day
    Iteration    Epoch    TimeElapsed    LearnRate    TrainingLoss    TrainingAccuracy
    _________    _____    ___________    _________    ____________    ________________
            1        1       00:00:00        0.001         0.69781              40.625
           50        1       00:00:00        0.001         0.65881              64.844
          100        1       00:00:00        0.001         0.70176              50.781
          117        1       00:00:00        0.001         0.63057              69.531
Training stopped: Max epochs completed
...
...
...
1. day
    Iteration    Epoch    TimeElapsed    LearnRate    TrainingLoss    TrainingAccuracy
    _________    _____    ___________    _________    ____________    ________________
            1        1       00:00:00        0.001         0.70017              41.406
           50        1       00:00:00        0.001         0.65913              64.844
          100        1       00:00:00        0.001          0.6985              50.781
          117        1       00:00:00        0.001         0.62994              69.531
Training stopped: Max epochs completed
...
...
...
1. day
    Iteration    Epoch    TimeElapsed    LearnRate    TrainingLoss    TrainingAccuracy
    _________    _____    ___________    _________    ____________    ________________
            1        1       00:00:00        0.001         0.69753              42.188
           50        1       00:00:00        0.001          0.6619              64.844
          100        1       00:00:00        0.001         0.70356              50.781
          117        1       00:00:00        0.001          0.6291              69.531
Training stopped: Max epochs completed
...
...
...
Here is my code snippet:
%% Define training options 
train_opts = trainingOptions( ...
    "adam", ...
    InitialLearnRate = 0.001, ...
    MiniBatchSize= 128, ...
    Plots = "none", ...
    Verbose = true, ...
    MaxEpochs = 1, ...
    Shuffle = "never", ...
    Metrics = "accuracy" ...
    );
%% Define network.
net = dlnetwork;
temp_net = [
    sequenceInputLayer(2,"Name","input")
    lstmLayer(256,"Name","lstm","OutputMode","last")
    dropoutLayer(0.5,"Name","dropout")
    fullyConnectedLayer(2,"Name","output")
    softmaxLayer];
net = addLayers(net, temp_net);
net = initialize(net);
% clean up helper variable
clear temp_net;
%% Load the data for each day and train the network.
num_of_epochs = 30;
train_data_length = round(length(processed_data) * 0.9);
train_data = processed_data(1: train_data_length);
for epoch = 1:num_of_epochs
    for day = 1:train_data_length
        if train_opts.Verbose
            disp(day + ". day")
        end
        train_X = processed_data{day}(:, 1:2);
        train_X = dlarray(train_X, "BCT");
        train_Y = categorical(processed_data{day}(:, 3));
        net = trainnet(train_X, train_Y, net, "crossentropy", train_opts);
    end
end
0 Comments
Answers (1)
  Karan Singh
      
 on 25 Feb 2025
        I think when you call your training function (in this case, your custom "trainnet" with "MaxEpochs=1") for each day’s data, MATLAB’s training routine reinitializes the internal training state. This means that while the network’s learned weights are preserved between calls, things like the optimizer’s momentum or Adam’s moment estimates and, by default, the LSTM’s hidden and cell states are reset at the start of each new training session.
So, to answer your question:
The LSTM’s learned weights are carried over, but the training “state” (including optimizer states and the internal sequence states) is reset each time you resume training.
This is expected behavior when using MATLAB’s built-in training routines in this manner. If you want to maintain the optimizer state, you would need to implement a custom training loop that preserves those states across batches.
Karan
See Also
Categories
				Find more on Image Data Workflows in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
