NAR network outputing previous (t-1) value. Why?

Hello,
I have a problem with the MATLAB NAR network, which I have noticed elsewhere (I'll explain a bit later on). I'll explain this in layman's terms:
Basically, when I load a time series into MATLAB (for example, a stock price over time, spanning, say 5 years), I have a suitable timeseries where I would like the NAR network (say, with delay=5), to learn the past (previous) five closing prices and it's relation to the next closing price.
The NAR will learn this, one step at a time, looking at the previous 5 prices, and calculating their relation to the current price (which the network is also shown). The network will advance through all the 5 years of data, learning by example (current position price vs last 5 prices), etc., etc.
That all seems good and well. However, with data that the NAR has already seen, whenever I ask the network 5 previous prices (which it has seen), it should output (calculate) the next price (which it has also seen during training). This is what I would expect of any network (unless I'm totally wrong here).
But instead, the NAR outputs the previous (t-1) price. So, basically (where p = price):
I expect: (p(t-5), p(t-4), p(t-3), p(t-2), p(t-1)) = p(t)
but the NAR gives me: (p(t-5), p(t-4), p(t-3), p(t-2), p(t-1)) = p(t-1)
(this with data that the NAR has already seen)
Why is this?
I also built an Elman network using Encog, and got basically the same results. Tried a Deep Belief network using Accord.NET and the same thing. Tried standard feedforward, Jordan, SVM, RBF, etc. Nothing does it. Why?
They are all acting like naive predictors.
Independently from my code/data I have used, I've tried with a simple timeseries (1, 2, 3, 4, 5 .... 2000) and all networks learn perfectly, but not with stock prices.
I've also tried using deltas, log, sqrt, etc. with no luck (on stock data).
I've tried several delays: d=5, d=7, d=10, d=20, d=30, d=40, d=50, d=100 and only d=50 turned up not exactly a naive predictor, but results were significantly off using just training data.
These experiments have been made only with training data.
Why? Is stock price data "unlearnable"?
I've seen this question asked some other places, but no satisfactory answer.
As a sidenote, all MATLAB code was done using nnstart.
Thanks!

21 Comments

You are going to have to either post one or more data examples or their links
Greg
Ok.
This is what I have done:
1) Download csv file from Yahoo Finance for Coca-Cola (KO) from 2010-08-02 to 2016-11-22
2) Import csv to SQL Server 2008 R2 database table via SSIS. Table now has 1591 rows and 7 columns (Date, Open, High, Low, Close, Volume, Adj Close)
3) Import a subset of the data to MATLAB like so:
conn = database.ODBCConnection('xxxxxx','yyyyyy','zzzzzz');
fromtarget = ' ''2011-11-14''';
totarget= ' ''2016-11-10''';
setdbprefs('DataReturnFormat','cellarray');
sqlquery = strcat('select [Adj Close] from [dddddd].[dbo].[KO] WHERE [Date] BETWEEN ', fromtarget, ' and ', totarget, ' ORDER BY [Date]');
curs = exec(conn,sqlquery);
curs = fetch(curs);
inputtmp = curs.Data;
targets = rot90(inputtmp);
close(curs);
close(conn);
clearvars inputtmp;
I now have a 1x1257 cell matrix.
4) type nnstart and enter
5) Select Time Series app
6) Select NAR and click Next
7) Select targets from the Targets dropdown and click Next
8) Select 15% validation and 15% testing and click Next
9) Select 20 hidden neurons (no exact reason) and delays = 5 and click Next
10) Leave LM training algo and click Train. Off she goes. Click Next
11) Click Next again
12) Click Next again
13) Select all Save Data checkboxes except MATLAB struct and click Save Results
14) Click Simple Script
15) Click Finish
16) Modify code Line 53 to
figure, plotresponse(t(end-40:end),y(end-40:end))
17) Run code. Save script as KO_NAR.m
I get a response plot. I edit plot and change Targets to Line and erase Errors. This is what I get:
Clearly, the NAR response is t-1
(continued...)
Here is the full code:
% Solve an Autoregression Time-Series Problem with a NAR Neural Network
% Script generated by Neural Time Series app
% Created 06-Dec-2016 21:03:39
%
% This script assumes this variable is defined:
%
% targets - feedback time series.
T = targets;
% Choose a Training Function
% For a list of all training functions type: help nntrain
% 'trainlm' is usually fastest.
% 'trainbr' takes longer but may be better for challenging problems.
% 'trainscg' uses less memory. Suitable in low memory situations.
trainFcn = 'trainlm'; % Levenberg-Marquardt backpropagation.
% Create a Nonlinear Autoregressive Network
feedbackDelays = 1:5;
hiddenLayerSize = 20;
net = narnet(feedbackDelays,hiddenLayerSize,'open',trainFcn);
% Prepare the Data for Training and Simulation
% The function PREPARETS prepares timeseries data for a particular network,
% shifting time by the minimum amount to fill input states and layer
% states. Using PREPARETS allows you to keep your original time series data
% unchanged, while easily customizing it for networks with differing
% numbers of delays, with open loop or closed loop feedback modes.
[x,xi,ai,t] = preparets(net,{},{},T);
% Setup Division of Data for Training, Validation, Testing
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
% Train the Network
[net,tr] = train(net,x,t,xi,ai);
% Test the Network
y = net(x,xi,ai);
e = gsubtract(t,y);
performance = perform(net,t,y)
% View the Network
view(net)
% Plots
% Uncomment these lines to enable various plots.
%figure, plotperform(tr)
%figure, plottrainstate(tr)
%figure, ploterrhist(e)
%figure, plotregression(t,y)
figure, plotresponse(t(end-40:end),y(end-40:end))
%figure, ploterrcorr(e)
%figure, plotinerrcorr(x,e)
% Closed Loop Network
% Use this network to do multi-step prediction.
% The function CLOSELOOP replaces the feedback input with a direct
% connection from the outout layer.
netc = closeloop(net);
netc.name = [net.name ' - Closed Loop'];
view(netc)
[xc,xic,aic,tc] = preparets(netc,{},{},T);
yc = netc(xc,xic,aic);
closedLoopPerformance = perform(net,tc,yc)
% Step-Ahead Prediction Network
% For some applications it helps to get the prediction a timestep early.
% The original network returns predicted y(t+1) at the same time it is
% given y(t+1). For some applications such as decision making, it would
% help to have predicted y(t+1) once y(t) is available, but before the
% actual y(t+1) occurs. The network can be made to return its output a
% timestep early by removing one delay so that its minimal tap delay is now
% 0 instead of 1. The new network returns the same outputs as the original
% network, but outputs are shifted left one timestep.
nets = removedelay(net);
nets.name = [net.name ' - Predict One Step Ahead'];
view(nets)
[xs,xis,ais,ts] = preparets(nets,{},{},T);
ys = nets(xs,xis,ais);
stepAheadPerformance = perform(nets,ts,ys)
If I do this exercise with an Encog Elman network, here is what I get:
Clearly, once again, I am getting t-1 from the Elman. Same goes for just about any other network type I have tried...
Why? Any input, insights, opinions, thoughts are welcome.
What I am expecting is a plot with both traces similar or somewhat similar, aligned.
Thanks
You did not post any data or data links as requested.
It doesn't have to be your data.
MATLAB NARNET examples from
help nndatasets
doc nndatsets
are acceptable as long as it exemplifies your problem.
In fact, the simpler the data, the clearer the answer to your problem.
Greg
Sorry.
Attached is a csv file with only the adjusted close price of Coca-Cola (KO) from 2011-11-14 to 2016-11-10 taken from Yahoo Finance.
In my example I loaded this from a SQL table; however, the data in the csv is the same.
Thank you
Sorry, having EXCEL problems.
Please convert to *.txt or *.m
Thanks
Greg
The csv file is a text file which can be opened with Notepad or any other text editor.
I just changed the .csv extension to .txt
That does the trick too.
Following this question with utmost interest. Have read various papers about stock price forecasting using ANNs but the researchers completely failed to notice their charts are time-shifted one day (ANN validation resolves to previous price). How could they miss that?
Examples:
https://arxiv.org/ftp/arxiv/papers/1502/1502.06434.pdf
https://www.duo.uio.no/bitstream/handle/10852/44765/aamodt-master.pdf?sequence=7 (page 65, 67, 69, 73, etc)
https://nseindia.com/content/research/FinalPaper206.pdf
List goes on and on.....
Whoa! That Oslo paper is full of such t-1 plots!
Is everyone experiencing the same behaviour?
Any comments are welcome, even if they are non-successes...
Molasar
Molasar on 3 Jan 2017
Edited: Molasar on 3 Jan 2017
Anyone?
Has anyone checked (edited) their graphs in the way explained above and gotten a different result?
Sorry for the lack of responses. My computer and MATLAB installation do not work correctly. I haven't has sufficient time to fix either.
Hope I don't need a new machine.
Greg
Likely this would require seeing your code to determine the issue. Please post a simple example of this not working, preferably with a sample data set shipped with MATLAB like:
load Data_GlobalIdx2
or
load stockreturns
@Greg: Ooops! Hope you can get your machine up and runnin'
@Brendan: I'll load the sample data you mention this evening and post results. As for code, it is posted in this thread (7 Dec 2016 at 3:39) higher up.
Thanks!
Ok, loaded the Data_GlobalIdx2 dataset.
Used column 6 of the Data array.
Same result. Here is my code (mostly generated automatically by MATLAB):
% Solve an Autoregression Time-Series Problem with a NAR Neural Network
% Script generated by Neural Time Series app
% Created 04-Jan-2017 20:23:24
%
% This script assumes this variable is defined:
%
% targets - feedback time series.
clearvars;
load Data_GlobalIdx2;
Data(:,1:5)=[];
Data(:,2)=[];
targets = rot90(Data);
T = tonndata(targets,true,false);
% Choose a Training Function
% For a list of all training functions type: help nntrain
% 'trainlm' is usually fastest.
% 'trainbr' takes longer but may be better for challenging problems.
% 'trainscg' uses less memory. Suitable in low memory situations.
trainFcn = 'trainlm'; % Levenberg-Marquardt backpropagation.
% Create a Nonlinear Autoregressive Network
feedbackDelays = 1:5;
hiddenLayerSize = 20;
net = narnet(feedbackDelays,hiddenLayerSize,'open',trainFcn);
% Prepare the Data for Training and Simulation
% The function PREPARETS prepares timeseries data for a particular network,
% shifting time by the minimum amount to fill input states and layer
% states. Using PREPARETS allows you to keep your original time series data
% unchanged, while easily customizing it for networks with differing
% numbers of delays, with open loop or closed loop feedback modes.
[x,xi,ai,t] = preparets(net,{},{},T);
% Setup Division of Data for Training, Validation, Testing
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
% Train the Network
[net,tr] = train(net,x,t,xi,ai);
% Test the Network
y = net(x,xi,ai);
e = gsubtract(t,y);
performance = perform(net,t,y)
% View the Network
view(net)
% Plots
% Uncomment these lines to enable various plots.
%figure, plotperform(tr)
%figure, plottrainstate(tr)
%figure, ploterrhist(e)
%figure, plotregression(t,y)
figure, plotresponse(t(end-40:end),y(end-40:end))
%figure, ploterrcorr(e)
%figure, plotinerrcorr(x,e)
% Closed Loop Network
% Use this network to do multi-step prediction.
% The function CLOSELOOP replaces the feedback input with a direct
% connection from the outout layer.
netc = closeloop(net);
netc.name = [net.name ' - Closed Loop'];
view(netc)
[xc,xic,aic,tc] = preparets(netc,{},{},T);
yc = netc(xc,xic,aic);
closedLoopPerformance = perform(net,tc,yc)
% Step-Ahead Prediction Network
% For some applications it helps to get the prediction a timestep early.
% The original network returns predicted y(t+1) at the same time it is
% given y(t+1). For some applications such as decision making, it would
% help to have predicted y(t+1) once y(t) is available, but before the
% actual y(t+1) occurs. The network can be made to return its output a
% timestep early by removing one delay so that its minimal tap delay is now
% 0 instead of 1. The new network returns the same outputs as the original
% network, but outputs are shifted left one timestep.
nets = removedelay(net);
nets.name = [net.name ' - Predict One Step Ahead'];
view(nets)
[xs,xis,ais,ts] = preparets(nets,{},{},T);
ys = nets(xs,xis,ais);
stepAheadPerformance = perform(nets,ts,ys)
Response plot:
Still shifted. Why?
It is not that it is outputting the t-1 value, you can see there is a difference. A similar pattern with a lag is not the same as returning to you the previous value, so you have mislead with your original statement.
What is happening is your response is influenced by the lagged prices which is exactly what I would expect for an AR model which has positive auto-correlation at lag 1. What you've learned is that the patterns of stock prices are not a deterministic function of previous stock prices. I would not be surprised by this result.
It's not that Neural Networks are not useful for stock data, it's just that such a simplistic model is not going to give you any useful information.
Molasar
Molasar on 7 Jan 2017
Edited: Molasar on 7 Jan 2017
I see what you mean; however, I don't think I have really misled with my original statement as the two previously posted plots show an almost identical t-1 response.
I'll give this some thought. But it seems to me a naive predictor will beat a NAR hands down with this kind of problem...
The question now begs to be asked: How then can I get any useful information using a NAR or any other ANN with stock data? Almost certainly a topic for a new thread...
I would likely consider exogenous variables. Possibly macro variables, volume/momentum, or even data derived from Twitter posts.

Sign in to comment.

Answers (0)

Categories

Asked:

on 5 Dec 2016

Commented:

on 9 Jan 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!