LSTM network time series prediction error occurs at the initial time step

Question

0 votes

I have trained a LSTM network for time series regression. After training, I want to test its performance based on the test dataset. The testing result of one single sample (extracted from minibatch results) is shown as follows:

The prediction result has a transient response process. I think this issue is caused by the zero states (CellStates and HiddenStates) of LSTM netweok. How to resolve this zero states problem when predicting time sreries ?

2 Comments
Show None Hide None

xingxingcui on 4 Apr 2026

@Chuguang PanCould you share your original files, including the code that produced this figure? Please include the minimal code needed to reproduce it.

Chuguang Pan on 5 Apr 2026

Open in MATLAB Online

@xingxingcui. Thanks for your reply. The minimal working code is shown below. It should be noted that the datas used for training are cretaed randomly, and the testing results illustrate that the prediction of lstm network has also a transient regmie which may be induced by "cold-start"(zero initial states) of lstm network.

 
% Example Data for Time-Series Regression 
fs = 1e3; % signal samplerate
t = 0:1/fs:120; % sample time
predictorData = [0.1 + sin(2*pi*0.5*t)
                 2 + cos(2*pi*2*t)
                 1 + 2*sin(2*pi*t)
                 sin(2*pi*10*t)
                 cos(2*pi*5*t)
                 ];
targetData = [1e6 * cos(2*pi*0.1*t) + 1e7
              1e6 * sin(2*pi*0.1*t) + 1e7];
% slicing long time series with non-overlapping fixed sliding window 
sampleLen = 2048; % sliding window size
numIn = size(predictorData,1); % Input Sequency Dimension
numOut = size(targetData,1); % Output Sequency Dimension
seqLenIn = size(predictorData,2); % Input sequency length
seqLenOut = size(targetData,2); % Output sequency lengh
% Data normalization
predictorDataN = normalize(predictorData,2,"zscore","std");
targetDataN = normalize(targetData,2,"zscore","std");
predictorArray = reshape(predictorDataN(:,1:end-mod(seqLenIn,sampleLen)),numIn,sampleLen,[]);
targetArray = reshape(targetDataN(:,1:end-mod(seqLenOut,sampleLen)),numOut,sampleLen,[]);

Training Time Series Regression Model using LSTM network

% datastore construction

trainXds = arrayDatastore(predictorArray,"IterationDimension",3); % "CTB"

trainTds = arrayDatastore(targetArray,"IterationDimension",3);

dsTrain = combine(trainXds,trainTds,"ReadOrder","associated");

% Specifying training options

numEpochs = 120;

miniBatchSize = 12;

initLR = 0.002;

mbqTrain = minibatchqueue(dsTrain,2,"MiniBatchFcn",@preprocessMiniBatch,...

"MiniBatchSize",miniBatchSize,"OutputAsDlarray",[true,true],...

"MiniBatchFormat",["CTB","CTB"],"OutputCast",["single","single"],...

"PartialMiniBatch","return","OutputEnvironment","auto");

% Train Model

layers = [sequenceInputLayer(numIn,"Normalization","none","MinLength",sampleLen,"Name","Input")

lstmLayer(40,"OutputMode","sequence")

layerNormalizationLayer("Name","LN")

fullyConnectedLayer(numOut,"Name","Output")];

net = dlnetwork(layers);

trailingAvg = [];

trailingAvgSq = [];

numObservations = dsTrain.numpartitions;

numIterationsPerEpoch = ceil(numObservations / miniBatchSize);

numIterations = numIterationsPerEpoch * numEpochs;

epoch = 0;

iteration = 0;

figure;

an = animatedline("Color","b","LineWidth",2);

while epoch < numEpochs

epoch = epoch + 1;

% Shuffle data.

shuffle(mbqTrain);

% Loop over mini-batches

while hasdata(mbqTrain)

iteration = iteration + 1;

[X,T] = next(mbqTrain);

[loss,gradients,states] = dlfeval(@modelLoss,net,X,T);

[net,trailingAvg,trailingAvgSq] = adamupdate(net,gradients,trailingAvg,trailingAvgSq,iteration,initLR);

if mod(epoch,10) == 0

initLR = initLR * 0.98; % learn rate decay

end

an.addpoints(iteration,extractdata(gather(loss)));

drawnow;

end

Test training performance

reset(mbqTrain);

while hasdata(mbqTrain)

[trainX,trainT] = next(mbqTrain);

predY = predict(net,trainX);

plotIdx = 3;

plotT = reshape(extractdata(gather(trainT(:,plotIdx,:))),numOut,[]);

plotY = reshape(extractdata(gather(predY(:,plotIdx,:))),numOut,[]);

plot((1:sampleLen)/fs,[plotT(1,:);plotY(1,:)]);

legend(["true","prediction"]);

end

Helper Functions

function [X,T] = preprocessMiniBatch(xdata,tdata)
    
        X = cat(3,xdata{:});
        T = cat(3,tdata{:});
end
function [loss,gradients,states] = modelLoss(net,X,T)
        [Y,states] = forward(net,X);
        loss = l2loss(Y,T,"NormalizationFactor","batch-size");
        gradients = dlgradient(loss,net.Learnables);
end

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Ritam on 16 Apr 2026

0 votes

I observed from the code that your “modelLoss” returns states, but you don’t write them back to the network. You are essentially reshaping the long sequence into [C × T × numWindows] and feeding each window as a separate sequence. That means each window begins with an implicit reset to initial state (zeros), unless you explicitly carry state across windows.

In Time Series Forecasting Using Deep Learning - MATLAB & Simulink, the model states are explicitly updated in each iteration. It may resolve the issue that you are encountering.

1 Comment
Show -1 older comments Hide -1 older comments

Chuguang Pan on 17 Apr 2026

@Ritam. Thanks for your answer.

Sign in to comment.

LSTM network time series prediction error occurs at the initial time step

2 Comments
Show None Hide None

Answers (1)

1 Comment
Show -1 older comments Hide -1 older comments

Categories

Products

Release

Tags

Community Treasure Hunt

LSTM network time series prediction error occurs at the initial time step

2 Comments Show None Hide None

Answers (1)

1 Comment Show -1 older comments Hide -1 older comments

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

2 Comments
Show None Hide None

1 Comment
Show -1 older comments Hide -1 older comments