How to enhance the performance of for-loops and cell-arrays (related to statistical calculations)?

1 view (last 30 days)
The following code calculates some performance measures out of different periodes. No error messages occured, but the processing time is very long.
F = 'runoff.txt'; % name of the file
D = 'C:\Users\heute\model\results\model_standalone\'; % absolute or relative path of base directory
S = dir(fullfile(D,'results*'));
X = [S.isdir] & ~ismember({S.name},{'.','..'});
N = {S(X).name};
L = cell(size(N));
C = cell(size(N));
for k = 1:numel(N)
T = fullfile(D,N{k},F);
fid = fopen(T,'rt');
fmt = ['%s',repmat('%f',1,6)];
opt = {'HeaderLines',1,'CollectOutput',true};
Z = textscan(fid,fmt,opt{:});
fclose(fid);
L{k} = Z{1}; % timestamp
C{k} = Z{2}; % data
%
Qs = C{k}(:,6); % define the simulated runoff, as column 6 in each cell array
%
% define the periodes for computing performance measures
sdatelim_neu = [datenum(2013,10,01,00,00,00) datenum(2016,10,01,00,00,00)];
dt = 1/24;
date = sdatelim_neu(1):dt:sdatelim_neu(2);
date_runoff = transpose(date);
%
sdatelim1 = [datenum(2014,05,01,00,00,00) datenum(2014,10,01,00,00,00)];
dt = 1/24;
sdate_sdatelim1 = sdatelim1(1):dt:sdatelim1(2);
%
sdatelim2 = [datenum(2015,05,01,00,00,00) datenum(2015,10,01,00,00,00)];
sdate_sdatelim2 = sdatelim2(1):dt:sdatelim2(2);
%
sdatelim3 = [datenum(2016,05,01,00,00,00) datenum(2016,10,01,00,00,00)];
sdate_sdatelim3 = sdatelim3(1):dt:sdatelim3(2);
%
% loop over the different periodes
for s = 1:length(sdate_sdatelim1);
for a = 1:length(sdate_sdatelim2);
for b = 1:length(sdate_sdatelim3);
j = find(date_runoff >= sdate_sdatelim1(s) & date_runoff < sdate_sdatelim1(k)+dt) & find(date_runoff >= sdate_sdatelim2(a) & date_runoff < sdate_sdatelim2(a)+dt) & find(date_runoff >= sdate_sdatelim3(b) & date_runoff < sdate_sdatelim3(b)+dt);
f_1k = 1-cov(Qs(j) - Qo)/var(Qo); %NSE
f_2k = sqrt(mean((Qs(j) - Qo).^2)); %RMSE
f_3k = abs(mean(Qs(j)- Qo)); %BIAS
%Qo is the observed runoff -> imported from file
%
% write into matrix YA -> for use in further analysis
YA = [f_1k, f_2k, f_3k];
end
end
end
end
As a test case, I ran this code for two inputfiles (each of them has 26280 rows in column 6). In the end however several 1000 input-files should be processed.
How can I reduce the computing time?
or is there an error within the for-loop over the different periods? or is this:
Qs = C{k}(:,6); % define the simulated runoff, as column 6 in each cell array
an inefficient command?
(I use Matlab R2012a)
  7 Comments
Glazio
Glazio on 6 Jun 2017
@Walter Roberson: The code is trying to calculate the Root Mean Square Error, BIAS and NSE for each inputfile (runoff.txt) and should consider only certain periods for calculation.
The goal is a matrix YA which contains the performance measure combination for each runoff-file.
Glazio
Glazio on 6 Jun 2017
@Stephen Cobeldick, thanks for your help.
What exactly do you mean with:
"It does not seem to be necessary store the data from all files, as you only seem to process the data from the current file." ?
In the end, all results should be in YA.

Sign in to comment.

Accepted Answer

dpb
dpb on 6 Jun 2017
Edited: dpb on 8 Jun 2017
S = 'runoff.txt';
O = 'runoff_observed.txt';
D = 'C:\Users\heute\model\results\model_standalone\';
d = dir(fullfile(D,'results*')); % list of directories
fmtS = ['%{dd.MM.yyyy-HH:mm}D' repmat('%*f',1,5) '%f']; % simulated format string
fmtO = ['%{dd.MM.yyyy HH:mm}D' %f']; % observed format string
L=length(d); % number sudirs
YA=zeros(L,3); % preallocate
for k = 1:L % iterate over subdirs
fid = fopen(fullfile(D,d{k}.name,S),'rt'); % open simulated
Z=textscan(fid,fmtS,'headerlines',2,'collectoutput',1); % read simulated
fclose(fid);
dtS=Z{:,1}; % timestamp simulated (datetime)
Qs=Z{:,2}; % simulated data
fid = fopen(fullfile(D,d.name{k},O),'rt'); % open observed
Z=textscan(fid,fmtO,'headerlines',1,'collectoutput',1); % read observed
fclose(fid);
dtO=Z{:,1}; % timestamp observed (datetime)
Qo=Z{:,2}; % observed data
% define the periods for computing performance measures
yr1=2014; yr2=2016; % years to compute over
output
ix=isbetween(dtO,datenum(yr(1),05,01),datenum(yr(1),10,01)); % first year
for yr=yr1+1:yr2 % subsequent years
ix=ix | isbetween(dtO,datenum(yr,05,01),datenum(yr,10,01));
end
YA(k,:) = [f_1k, f_2k, f_3k];
end
ADDENDUM
Cleaned up to incorporate changes from conversation below excepting for opening the reference file--treat that as need to. Above should then return the L records in the output array.
ERRATUM
NB: Remove the (1) index from year reference yr in the loop to get the subsequent years after first...inadvertently left it in there when copied line.
  15 Comments
dpb
dpb on 8 Jun 2017
Well, not if you didn't make the fixup I noted above--it'll run but will process the first year three times as was.
MORAL: ALWAYS debug thoroughly; running w/o error doesn't guarantee correctness!

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!