LSTM network time series prediction error occurs at the initial time step

I have trained a LSTM network for time series regression. After training, I want to test its performance based on the test dataset. The testing result of one single sample (extracted from minibatch results) is shown as follows:
The prediction result has a transient response process. I think this issue is caused by the zero states (CellStates and HiddenStates) of LSTM netweok. How to resolve this zero states problem when predicting time sreries ?

2 Comments

@Chuguang PanCould you share your original files, including the code that produced this figure? Please include the minimal code needed to reproduce it.
@xingxingcui. Thanks for your reply. The minimal working code is shown below. It should be noted that the datas used for training are cretaed randomly, and the testing results illustrate that the prediction of lstm network has also a transient regmie which may be induced by "cold-start"(zero initial states) of lstm network.
% Example Data for Time-Series Regression
fs = 1e3; % signal samplerate
t = 0:1/fs:120; % sample time
predictorData = [0.1 + sin(2*pi*0.5*t)
2 + cos(2*pi*2*t)
1 + 2*sin(2*pi*t)
sin(2*pi*10*t)
cos(2*pi*5*t)
];
targetData = [1e6 * cos(2*pi*0.1*t) + 1e7
1e6 * sin(2*pi*0.1*t) + 1e7];
% slicing long time series with non-overlapping fixed sliding window
sampleLen = 2048; % sliding window size
numIn = size(predictorData,1); % Input Sequency Dimension
numOut = size(targetData,1); % Output Sequency Dimension
seqLenIn = size(predictorData,2); % Input sequency length
seqLenOut = size(targetData,2); % Output sequency lengh
% Data normalization
predictorDataN = normalize(predictorData,2,"zscore","std");
targetDataN = normalize(targetData,2,"zscore","std");
predictorArray = reshape(predictorDataN(:,1:end-mod(seqLenIn,sampleLen)),numIn,sampleLen,[]);
targetArray = reshape(targetDataN(:,1:end-mod(seqLenOut,sampleLen)),numOut,sampleLen,[]);
Training Time Series Regression Model using LSTM network
% datastore construction
trainXds = arrayDatastore(predictorArray,"IterationDimension",3); % "CTB"
trainTds = arrayDatastore(targetArray,"IterationDimension",3);
dsTrain = combine(trainXds,trainTds,"ReadOrder","associated");
% Specifying training options
numEpochs = 120;
miniBatchSize = 12;
initLR = 0.002;
mbqTrain = minibatchqueue(dsTrain,2,"MiniBatchFcn",@preprocessMiniBatch,...
"MiniBatchSize",miniBatchSize,"OutputAsDlarray",[true,true],...
"MiniBatchFormat",["CTB","CTB"],"OutputCast",["single","single"],...
"PartialMiniBatch","return","OutputEnvironment","auto");
% Train Model
layers = [sequenceInputLayer(numIn,"Normalization","none","MinLength",sampleLen,"Name","Input")
lstmLayer(40,"OutputMode","sequence")
layerNormalizationLayer("Name","LN")
fullyConnectedLayer(numOut,"Name","Output")];
net = dlnetwork(layers);
trailingAvg = [];
trailingAvgSq = [];
numObservations = dsTrain.numpartitions;
numIterationsPerEpoch = ceil(numObservations / miniBatchSize);
numIterations = numIterationsPerEpoch * numEpochs;
epoch = 0;
iteration = 0;
figure;
an = animatedline("Color","b","LineWidth",2);
while epoch < numEpochs
epoch = epoch + 1;
% Shuffle data.
shuffle(mbqTrain);
% Loop over mini-batches
while hasdata(mbqTrain)
iteration = iteration + 1;
[X,T] = next(mbqTrain);
[loss,gradients,states] = dlfeval(@modelLoss,net,X,T);
[net,trailingAvg,trailingAvgSq] = adamupdate(net,gradients,trailingAvg,trailingAvgSq,iteration,initLR);
if mod(epoch,10) == 0
initLR = initLR * 0.98; % learn rate decay
end
an.addpoints(iteration,extractdata(gather(loss)));
drawnow;
end
end
Test training performance
reset(mbqTrain);
while hasdata(mbqTrain)
[trainX,trainT] = next(mbqTrain);
predY = predict(net,trainX);
plotIdx = 3;
plotT = reshape(extractdata(gather(trainT(:,plotIdx,:))),numOut,[]);
plotY = reshape(extractdata(gather(predY(:,plotIdx,:))),numOut,[]);
plot((1:sampleLen)/fs,[plotT(1,:);plotY(1,:)]);
legend(["true","prediction"]);
end
Helper Functions
function [X,T] = preprocessMiniBatch(xdata,tdata)
X = cat(3,xdata{:});
T = cat(3,tdata{:});
end
function [loss,gradients,states] = modelLoss(net,X,T)
[Y,states] = forward(net,X);
loss = l2loss(Y,T,"NormalizationFactor","batch-size");
gradients = dlgradient(loss,net.Learnables);
end

Sign in to comment.

Answers (1)

I observed from the code that your “modelLoss” returns states, but you don’t write them back to the network. You are essentially reshaping the long sequence into [C × T × numWindows] and feeding each window as a separate sequence. That means each window begins with an implicit reset to initial state (zeros), unless you explicitly carry state across windows.
In Time Series Forecasting Using Deep Learning - MATLAB & Simulink, the model states are explicitly updated in each iteration. It may resolve the issue that you are encountering.

Categories

Products

Release

R2025a

Asked:

on 3 Apr 2026

Commented:

on 17 Apr 2026

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!