Improve speed to max calculation for max daily output

6 views (last 30 days)
Hi there,
I have done a good few variations of this and this code is the simpliest but not the most time efficient and it seems that the max calculation is the slowest.
[nrow,ncol] = size(test_data);
test_output = cell(nrow,ncol);
for i = 1:nrow;
for j = 1:ncol;
inpdata = double(test_data{i,j});
% Build array of time components
dv = datevec(inpdata(:,1)) ;
% Find all timestamps where the HH value (col 4) is 0 and 6
time_markers = find(dv(:,4)==6 | dv(:,4)==0);
% Preallocate output array
daily_max = zeros(length(test_data),4);
for n = 1:2:length(time_markers)-2;
daily_max(n,1) = inpdata(time_markers(n+1,1),1);
daily_max(n,2) = mean(inpdata(time_markers(n,1):time_markers(n+1,1),2));
daily_max(n,3) = mean(inpdata(time_markers(n,1):time_markers(n+1,1),3));
daily_max(n,4) = max(inpdata(time_markers(n,1):time_markers(n+1,1),4));
end
% Remove any extra cells in output file
daily_max(daily_max(:,1)==0,:) = [];
% Plug this into the final output
test_output{i,j} = daily_max;
end
end
Any ideas of improving its performance? I know there are some unnecessary lines that I need to fine tune, but the main issue slowing the performance is the calling of the max, particularly, the mean function. The dv variable (first 10 rows) looks like this, if it helps understand what I want done:
1980 1 1 6 0 0
1980 1 1 12 0 0
1980 1 1 18 0 0
1980 1 2 0 0 0
1980 1 2 6 0 0
1980 1 2 12 0 0
1980 1 2 18 0 0
1980 1 3 0 0 0
1980 1 3 6 0 0
1980 1 3 12 0 0
and I want anything between 6 and the consecutive 0 to count as a day. The data has been formatted already so that there are no missing timesteps.
  2 Comments
Peter Perkins
Peter Perkins on 6 Aug 2015
Mashtine, you're using a triple-nested loop. That almost certainly is not the way to go.
You should attach a short example of your input data, what you want as the result, and an explanation of the calculations to create that result.
mashtine
mashtine on 6 Aug 2015
Hi Peter,
Thanks for helping with this. I have attached the first 100 rows of test_data{1,1} where the first 10 are as follows:
723181.25 3.2871747 0.34067032 5.4055758
723181.50 1.2680068 0.20843017 2.9019344
723181.75 2.8943899 0.22514595 4.5895572
723182 7.9142933 0.57301307 10.883902
723182.25 14.368793 1.0057755 18.923864
723182.50 17.886234 1.2893854 23.480335
723182.75 19.482813 1.2635506 24.982267
723183 15.509732 0.95686787 19.885626
723183.25 9.5619335 0.71554273 13.053712
723183.50 9.5422649 0.72752434 13.077939
Col 1 has the timestamp of the data in datenum form and the other three columns contain the data. I would like to get the daily mean for col 2 and 3, and the daily max for col 4. When you datevec the first column, I get the following:
1980 1 1 6 0 0
1980 1 1 12 0 0
1980 1 1 18 0 0
1980 1 2 0 0 0
1980 1 2 6 0 0
1980 1 2 12 0 0
1980 1 2 18 0 0
1980 1 3 0 0 0
1980 1 3 6 0 0
1980 1 3 12 0 0
where col 4 has the HH time value and 6,12,18,0 respectively represent one day. I knew three loops would be the killer here! But I am not sure how to get around the first two. The original test_data variable is 121x97.

Sign in to comment.

Accepted Answer

Peter Perkins
Peter Perkins on 6 Aug 2015
There are lots of ways to do this. Here's one that assumes you have R2014b or later. If you only have R2013b or later, you can still use a table, but you'd have to use datenum and datestr rather than datetime, which was added in R2014b.
First load your numeric matrix and create a table, and then convert the datenum to a datetime:
>> load test_data2.mat
>> test_data = array2table(test_data,'VariableNames',{'Time' 'X' 'Y' 'Z'});
>> test_data.Time = datetime(test_data.Time,'ConvertFrom','datenum')
test_data =
Time X Y Z
____________________ ______ _______ ______
01-Jan-1980 06:00:00 3.2872 0.34067 5.4056
01-Jan-1980 12:00:00 1.268 0.20843 2.9019
01-Jan-1980 18:00:00 2.8944 0.22515 4.5896
02-Jan-1980 00:00:00 7.9143 0.57301 10.884
02-Jan-1980 06:00:00 14.369 1.0058 18.924
02-Jan-1980 12:00:00 17.886 1.2894 23.48
[snip]
Next, create a variable that defines the way you want to group the rows of that table. On the minus side, you want to group midnight of tomorrow with 6am, 12pm, and 6pm of today, so you can't just get the day number. On the plus side, your data are completely regular, so you can just take each consecutive group of four rows:
>> n = height(test_data);
>> test_data.Day = repelem(1:(n/4),4)'
test_data =
Time X Y Z Day
____________________ ______ _______ ______ ___
01-Jan-1980 06:00:00 3.2872 0.34067 5.4056 1
01-Jan-1980 12:00:00 1.268 0.20843 2.9019 1
01-Jan-1980 18:00:00 2.8944 0.22515 4.5896 1
02-Jan-1980 00:00:00 7.9143 0.57301 10.884 1
02-Jan-1980 06:00:00 14.369 1.0058 18.924 2
02-Jan-1980 12:00:00 17.886 1.2894 23.48 2
[snip]
Finally, do the grouped calculation on the table, and pretty up the result:
>> dailyStats = @(x,y,z) deal(mean(x),mean(y),max(z));
>> dailies = rowfun(dailyStats,test_data, ...
'GroupingVariable','Day', 'InputVariables',{'X' 'Y' 'Z'}, ...
'OutputVariableNames',{'meanX' 'meanY' 'maxZ'});
>> dailies.Properties.RowNames = {}; % don't need these
>> dailies.Day = dateshift(test_data.Time(1:4:n),'start','day');
>> dailies.Day.Format = 'dd-MMM-yyyy'
dailies =
Day GroupCount meanX meanY maxZ
___________ __________ ______ _______ ______
01-Jan-1980 4 3.841 0.33681 10.884
02-Jan-1980 4 16.812 1.1289 24.982
03-Jan-1980 4 9.4298 0.62444 17.041
04-Jan-1980 4 14.185 0.97222 24.212
05-Jan-1980 4 12.899 0.99925 21.861
06-Jan-1980 4 6.2882 0.53728 10.743
[snip]
Hope this helps.
  2 Comments
mashtine
mashtine on 7 Aug 2015
Edited: mashtine on 7 Aug 2015
Thanks a lot Peter. I had no idea that this functionality existed in Matlab nor did I see it anywhere. However, do you know how I can efficiently incorporate this into a loop to go through the original test data (array of 121x97)? The test_data2 that I upload is the first 100 columns of test_data{1,1}.
It seems like having to go through 121x97 with this may really slow things down.
Peter Perkins
Peter Perkins on 14 Sep 2015
I think you mean, "I have to do the same grouped calculation of 121*97 sets of data." If that's the case, it seems like you have two options:
  1. Loop over the data sets and do the calculation 121*97 times, or
  2. Somehow combine the separate data sets into one
I can't say how to do the latter, since I don't really know anything about your data.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!