Does LSTM training reset after resuming learning?

Question

Eugen Fekete on 23 Feb 2025

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/2174414-does-lstm-training-reset-after-resuming-learning

Commented: Eugen Fekete on 25 Feb 2025

I have a database called processed_data, which contains cells structured like this:

980999999999767	0.945912306864893	1
46300000000338	0.926617136227153	1
511999999995169	0.868790509137634	2
00600000000122	0.978074194186882	1
995999999999185	0.884817478795566	2
12400000000343	0.740093883803231	2
35399999999936	0.418494628137842	2
653999999994994	0.399199457500103	2
00600000000122	0.438938088213894	2
999000000003434	0.566427539286267	2

The first column represents the seconds that have passed since the previous row, the second column is the value at that given time (normalized), and the third column contains the categorical values 1 and 2. The first two columns are the predictors, and the third column is the target.

Each cell represents one day and I have around 215 days worth of data, each with a varying number of observations. The goal is to create an LSTM algorithm that, based on the predictors, can predict whether the value will increase (2) or decrease (1) in the future. During training, I'm keeping each day separate by stopping the learning when the last batch for a given day is reached, then loading the data for the next day and resuming the training.

The problem is that when training resumes after loading the next day's data, it's as if the network is completely reset and starts learning from scratch. It repeatedly produces the same accuracy results (excluding the first iteration) with only slight changes in loss values. The accuracy in the remaining iterations remains unchanged, always producing the same values, as if the network is not learning at all. Here is an example output for day 1:

1. day
    Iteration    Epoch    TimeElapsed    LearnRate    TrainingLoss    TrainingAccuracy
    _________    _____    ___________    _________    ____________    ________________
            1        1       00:00:00        0.001         0.69781              40.625
           50        1       00:00:00        0.001         0.65881              64.844
          100        1       00:00:00        0.001         0.70176              50.781
          117        1       00:00:00        0.001         0.63057              69.531
Training stopped: Max epochs completed
...
...
...
1. day
    Iteration    Epoch    TimeElapsed    LearnRate    TrainingLoss    TrainingAccuracy
    _________    _____    ___________    _________    ____________    ________________
            1        1       00:00:00        0.001         0.70017              41.406
           50        1       00:00:00        0.001         0.65913              64.844
          100        1       00:00:00        0.001          0.6985              50.781
          117        1       00:00:00        0.001         0.62994              69.531
Training stopped: Max epochs completed
...
...
...
1. day
    Iteration    Epoch    TimeElapsed    LearnRate    TrainingLoss    TrainingAccuracy
    _________    _____    ___________    _________    ____________    ________________
            1        1       00:00:00        0.001         0.69753              42.188
           50        1       00:00:00        0.001          0.6619              64.844
          100        1       00:00:00        0.001         0.70356              50.781
          117        1       00:00:00        0.001          0.6291              69.531
Training stopped: Max epochs completed
...
...
...

Here is my code snippet:

%% Define training options 
train_opts = trainingOptions( ...
    "adam", ...
    InitialLearnRate = 0.001, ...
    MiniBatchSize= 128, ...
    Plots = "none", ...
    Verbose = true, ...
    MaxEpochs = 1, ...
    Shuffle = "never", ...
    Metrics = "accuracy" ...
    );
%% Define network.
net = dlnetwork;
temp_net = [
    sequenceInputLayer(2,"Name","input")
    lstmLayer(256,"Name","lstm","OutputMode","last")
    dropoutLayer(0.5,"Name","dropout")
    fullyConnectedLayer(2,"Name","output")
    softmaxLayer];
net = addLayers(net, temp_net);
net = initialize(net);
% clean up helper variable
clear temp_net;
%% Load the data for each day and train the network.
num_of_epochs = 30;
train_data_length = round(length(processed_data) * 0.9);
train_data = processed_data(1: train_data_length);
for epoch = 1:num_of_epochs
    for day = 1:train_data_length
        if train_opts.Verbose
            disp(day + ". day")
        end
        train_X = processed_data{day}(:, 1:2);
        train_X = dlarray(train_X, "BCT");
        train_Y = categorical(processed_data{day}(:, 3));
    
        net = trainnet(train_X, train_Y, net, "crossentropy", train_opts);
    end
end

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Karan Singh on 25 Feb 2025

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/2174414-does-lstm-training-reset-after-resuming-learning#answer_1560553

Hi @Eugen Fekete

I think when you call your training function (in this case, your custom "trainnet" with "MaxEpochs=1") for each day’s data, MATLAB’s training routine reinitializes the internal training state. This means that while the network’s learned weights are preserved between calls, things like the optimizer’s momentum or Adam’s moment estimates and, by default, the LSTM’s hidden and cell states are reset at the start of each new training session.

So, to answer your question:

The LSTM’s learned weights are carried over, but the training “state” (including optimizer states and the internal sequence states) is reset each time you resume training.

This is expected behavior when using MATLAB’s built-in training routines in this manner. If you want to maintain the optimizer state, you would need to implement a custom training loop that preserves those states across batches.

Karan

1 Comment
Show -1 older commentsHide -1 older comments

Eugen Fekete on 25 Feb 2025

Good to know! I'll try my best to create a custom training loop then! This might be a dumb question, but could the training state being reset at each trainnet call cause the network to not learn at all or only marginal learning?

Sign in to comment.

Does LSTM training reset after resuming learning?

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Does LSTM training reset after resuming learning?

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments