Optimize the ordering of nested loop for speed

13 views (last 30 days)
Suppose I have the following code. Will it be faster version 1 or version 2? What changes is the ordering of the two nested loops
VERSION 1
% bigArray has dim: [npolv,nz,nsv]
% npolv=68961 > nsv=200 > nz=81
% zgrid is [nz,1], kgrid is [nsv,1]
for j=1:nz
for qq=1:nsv
% the output of fun is a vector dim npolv
bigArray(:,j,qq) = fun(zgrid(j),kgrid(qq));
end
end
or VERSION 2
% bigArray has dim: [npolv,nz,nsv]
% npolv=68961 > nsv=200 > nz=81
% zgrid is [nz,1], kgrid is [nsv,1]
for qq=1:nsv
for j=1:nz
% the output of fun is a vector with dim npolv
bigArray(:,j,qq) = fun(zgrid(j),kgrid(qq));
end
end
  4 Comments
Bruno Luong
Bruno Luong on 20 Jan 2019
I suppose if you ask such question, then the bottleneck is nothing to do owith looping but calling the function inside the loop.
If you want to speedup, you need to vectorize the function FUN that accept ND-array. Changing loop order won't do much.
Image Analyst
Image Analyst on 20 Jan 2019
Or equally FAST. Like you said Bruno, the speed has nothing to do with looping - it's the insides that count. Look at how much time the looping alone spends:
nsv=200;
nz=81;
tic
for j=1:nz
for qq=1:nsv
;
end
end
toc
tic
for qq=1:nsv
for j=1:nz
;
end
end
toc
and you get times for the for loops alone that are so fast they're not even noticeable:
Elapsed time is 0.000221 seconds.
Elapsed time is 0.000075 seconds.
They're in the microseconds range. There is no way a person would notice those absolute elapsed times, much less a difference between those two times. They're just too fast!

Sign in to comment.

Accepted Answer

Image Analyst
Image Analyst on 19 Jan 2019
Why not use tic before the loop, and toc after the loop?
Because MATLAB is column major, you'll find it's best to put the right most indexes innermost, and the left indexes outermost:
for slice = 1 : slices
for col = 1 : numColumns
for row = 1 : numRows
array(row, col, slice) = ......
end
end
end
  5 Comments
Matthew Kehoe
Matthew Kehoe on 27 Jul 2021
Edited: Matthew Kehoe on 27 Jul 2021
@Bruno Luong It appears to make a difference when the loop variables are large.
% Matrix named array is size(#Rows,#Columns,#Slices)
% Slice and col are swapped which decreases performance
slices = 50;
numColumns = 20;
numRows = 30;
array = zeros(numRows,slices,numColumns);
array2 = zeros(numRows,numColumns,slices);
tic
for ntests = 1:1000
for slice = 1 : slices
for col = 1 : numColumns
for row = 1 : numRows
array(row, slice, col) = array(row, slice, col) + 5;
end
end
end
end
toc
tic
% Ordered by right most indexes innermost
for ntests = 1:1000
for slice = 1 : slices
for col = 1 : numColumns
for row = 1 : numRows
array2(row, col, slice) = array2(row, col, slice) + 5;
end
end
end
end
toc
% Elapsed time of method 1 is 0.053500 seconds.
% Elapsed time of method 2 is 0.041477 seconds.
and
% Matrix named array is size(#Rows,#Columns,#Slices)
% Slice and col are swapped which decreases performance
slices = 50;
numColumns = 400;
numRows = 300;
array = zeros(numRows,slices,numColumns);
array2 = zeros(numRows,numColumns,slices);
tic
for ntests = 1:1000
for slice = 1 : slices
for col = 1 : numColumns
for row = 1 : numRows
array(row, slice, col) = array(row, slice, col) + 5;
end
end
end
end
toc
tic
% Ordered by right most indexes innermost
for ntests = 1:1000
for slice = 1 : slices
for col = 1 : numColumns
for row = 1 : numRows
array2(row, col, slice) = array2(row, col, slice) + 5;
end
end
end
end
toc
% Elapsed time of method 1 is 12.153343 seconds.
% Elapsed time of method 2 is 8.756736 seconds.
Bruno Luong
Bruno Luong on 28 Jul 2021
Your code run on TMW server
% Matrix named array is size(#Rows,#Columns,#Slices)
% Slice and col are swapped which decreases performance
slices = 50;
numColumns = 400;
numRows = 300;
array = zeros(numRows,slices,numColumns);
array2 = zeros(numRows,numColumns,slices);
tic
for ntests = 1:1000
for slice = 1 : slices
for col = 1 : numColumns
for row = 1 : numRows
array(row, slice, col) = array(row, slice, col) + 5;
end
end
end
end
toc
Elapsed time is 13.637798 seconds.
tic
% Ordered by right most indexes innermost
for ntests = 1:1000
for slice = 1 : slices
for col = 1 : numColumns
for row = 1 : numRows
array2(row, col, slice) = array2(row, col, slice) + 5;
end
end
end
end
toc
Elapsed time is 12.603883 seconds.

Sign in to comment.

More Answers (1)

Mark McBroom
Mark McBroom on 19 Jan 2019
  1. use profile tool to determine hot spot in code.
  2. pre-allocate bigArray. On my computer this reduced execution time by 70%
  1 Comment
Alessandro D
Alessandro D on 19 Jan 2019
Dear Mark, thanks for your answer. I didn't report it in then sample code in my question but, yes, I had pre-allocated bigArray

Sign in to comment.

Categories

Find more on Performance and Memory in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!