Performance of cell arrays, multi-dimensional arrays, structure arrays, and multiple variables

33 views (last 30 days)
I've found that the performance of cell Arrays, multi-dimensional arrays, and structure arrays are all far slower than the use of multiple independent variables. Thus, my optimized codes have been using super long if-statements and functions which completely write out which variables are needed or not. The code is thus faster than using the other methods, but the code is also quite messy. A reproduction of the independent variables approach using 'eval' also doesn't work because of the way eval is executed by MATLAB.
Is there another way to do this without writing out each variable? Perhaps there I could use cell arrays but pass an argument which results in MATLAB treating them in a faster way... This might not seem like a big deal, but my codes take dozens of hours to execute (itt>1000), which becomes many days when the other 'MATLAB-preferred' methods are used.
Here is a code showing this performance. Values of ytessel and xtessel can be changed as you please, but changing these values requires manually changing the code for the fastest method (independent variables).
%%model prep
ytessel = 4;
xtessel = 4;
nnum = 72*1;
ynum = nnum*ytessel;
xnum = nnum*xtessel;
itt = 5;
totemit = rand(ynum,xnum);
T = zeros(ynum,xnum);
for y=1:nnum
for x=1:nnum
dT_depot{y,x} = rand(ynum,xnum);
end
end
%%CELL ARRAYS
tic
for i=1:itt
for yt=1:ytessel
for xt=1:xtessel
A{yt,xt} = zeros(ynum,xnum);
end
end
for y=1:nnum
for x=1:nnum
dT = dT_depot{y,x};
for yt=1:ytessel
for xt=1:xtessel
A{yt,xt} = A{yt,xt} + dT.*totemit(y+(yt-1)*nnum,x+(xt-1)*nnum);
end
end
end
end
end
disp(sprintf('Time: %f (Cell Array Method)',toc));
%%4D Matrix
tic
for i=1:itt
C = zeros(ynum,xnum,ytessel,xtessel);
for y=1:nnum
for x=1:nnum
for yt=1:ytessel
for xt=1:xtessel
dT = dT_depot{y,x};
C(:,:,yt,xt) = C(:,:,yt,xt) + dT.*totemit(y+(yt-1)*nnum,x+(xt-1)*nnum);
end
end
end
end
end
disp(sprintf('Time: %f (4D Array Method)',toc));
%%Independent Variables
D11 = zeros(ynum,xnum); D12 = zeros(ynum,xnum); D13 = zeros(ynum,xnum); D14 = zeros(ynum,xnum);
D21 = zeros(ynum,xnum); D22 = zeros(ynum,xnum); D23 = zeros(ynum,xnum); D24 = zeros(ynum,xnum);
D31 = zeros(ynum,xnum); D32 = zeros(ynum,xnum); D33 = zeros(ynum,xnum); D34 = zeros(ynum,xnum);
D41 = zeros(ynum,xnum); D42 = zeros(ynum,xnum); D43 = zeros(ynum,xnum); D44 = zeros(ynum,xnum);
tic
for i=1:itt
D11(:) = 0; D12(:) = 0; D13(:) = 0; D14(:) = 0;
D21(:) = 0; D22(:) = 0; D23(:) = 0; D24(:) = 0;
D31(:) = 0; D32(:) = 0; D33(:) = 0; D34(:) = 0;
D41(:) = 0; D42(:) = 0; D43(:) = 0; D44(:) = 0;
for y=1:nnum
for x=1:nnum
curdepot = dT_depot{y,x};
D11 = D11 + curdepot.*totemit(y, x);
D21 = D21 + curdepot.*totemit(y+nnum, x);
D31 = D31 + curdepot.*totemit(y+nnum*2,x);
D41 = D41 + curdepot.*totemit(y+nnum*3,x);
D12 = D12 + curdepot.*totemit(y, x+nnum);
D22 = D22 + curdepot.*totemit(y+nnum, x+nnum);
D32 = D32 + curdepot.*totemit(y+nnum*2,x+nnum);
D42 = D42 + curdepot.*totemit(y+nnum*3,x+nnum);
D13 = D13 + curdepot.*totemit(y,x+nnum*2);
D23 = D23 + curdepot.*totemit(y+nnum,x+nnum*2);
D33 = D33 + curdepot.*totemit(y+nnum*2,x+nnum*2);
D43 = D43 + curdepot.*totemit(y+nnum*3,x+nnum*2);
D14 = D14 + curdepot.*totemit(y,x+nnum*3);
D24 = D24 + curdepot.*totemit(y+nnum,x+nnum*3);
D34 = D34 + curdepot.*totemit(y+nnum*2,x+nnum*3);
D44 = D44 + curdepot.*totemit(y+nnum*3,x+nnum*3);
end
end
end
disp(sprintf('Time: %f (Independent Variables Method)',toc));
%%STRUCT ARRAYS
for yt=1:ytessel
for xt=1:xtessel
DD(yt,xt).dat = zeros(ynum,xnum);
end
end
tic
for i=1:itt
for yt=1:ytessel
for xt=1:xtessel
DD(yt,xt).dat = zeros(ynum,xnum);
end
end
for y=1:nnum
for x=1:nnum
curdepot = dT_depot{y,x};
for yt=1:ytessel
for xt=1:xtessel
DD(yt,xt).dat = DD(yt,xt).dat + curdepot.*totemit(y+(yt-1)*nnum,x+(xt-1)*nnum);
end
end
end
end
end
disp(sprintf('Time: %f (Structure Arrays Method)',toc));
%%Eval Method
for yt=1:ytessel
for xt=1:xtessel
eval(['D' num2str(yt) num2str(xt) '=zeros(ynum,xnum);']);
end
end
tic
for i=1:itt
for yt=1:ytessel
for xt=1:xtessel
eval(['D' num2str(yt) num2str(xt) '(:)=0;']);
end
end
for y=1:nnum
for x=1:nnum
curdepot = dT_depot{y,x};
for yt=1:ytessel
for xt=1:xtessel
eval(['D' num2str(yt) num2str(xt) '=D' num2str(yt) num2str(xt) '+curdepot.*totemit(y+nnum*' num2str(yt-1) ',x+nnum*' num2str(xt-1) ');']);
end
end
end
end
end
disp(sprintf('Time: %f (Eval Method)',toc));
Output:
Time: 16.279780 (Cell Array Method)
Time: 79.975502 (4D Array Method)
Time: 7.808595 (4D Independent Variables Method)
Time: 20.406543 (Structure Arrays Method)
Time: 127.129048 (Eval Method)
  8 Comments
Jan
Jan on 8 Feb 2018
I know that MATLAB's for-loops are notoriously inefficient, ...
This is an old rumor from the times before the JIT was introduced: R6.5 in 2002.

Sign in to comment.

Accepted Answer

Matt J
Matt J on 8 Feb 2018
Edited: Matt J on 9 Feb 2018
Secondly, how would these loops be better vectorized without using far more memory?
As below. I reduced the test data size to something my laptop could handle better, but the improvement
Time: 0.813528 (Independent Variables Method)
Time: 6.088266 (4D Array Method with Loops)
Time: 0.164137 (Array Method Vectorized)
is pretty clear.
%%model prep
ytessel = 4;
xtessel = 4;
nnum = 36*1;
ynum = nnum*ytessel;
xnum = nnum*xtessel;
itt = 5;
totemit = rand(ynum,xnum);
for y=1:nnum
for x=1:nnum
dT_depot{y,x} = rand(ynum,xnum);
end
end
%%LOOPS
tic
for i=1:itt
C = zeros(ynum,xnum,ytessel,xtessel);
for y=1:nnum
for x=1:nnum
for yt=1:ytessel
for xt=1:xtessel
dT = dT_depot{y,x};
C(:,:,yt,xt) = C(:,:,yt,xt) + dT.*totemit(y+(yt-1)*nnum,x+(xt-1)*nnum);
end
end
end
end
end
disp(sprintf('Time: %f (4D Array Method with Loops)',toc));
dT=reshape(cell2mat(dT_depot(:).'), ynum*xnum,nnum*nnum);
tmp=mat2cell(totemit,ones(1,ytessel)*nnum, ones(1,xtessel)*nnum);
S=reshape(cell2mat(tmp(:).'),nnum*nnum,[]);
%%VECTORIZED
tic
for i=1:itt
C=dT*S;
end
disp(sprintf('Time: %f (Array Method Vectorized)',toc));
C=reshape(C,ynum,xnum,ytessel,xtessel);
In any case, it seems like my code demonstrates that the use of cell and multi-dimensional arrays is inherently slow, thus even if the for-loop were somehow removed the inefficiency would remain.
No, the reason you see poor performance from arrays is specifically because of the way you use a loop to accumulate into sub-arrays. Specifically, in right-hand side expressions like,
C(:,:,yt,xt) + dT.*totemit(y+(yt-1)*nnum,x+(xt-1)*nnum);
you re-extract a sub-array C(:,:,yt,xt) from C repeatedly (nnum^2 times) and cause fresh memory to be allocated for this sub-array each time you do so. Your approach of using "independent variables" has less of this overhead because it does less right-hand side array indexing, however it is still not as optimal as the fully vectorized approach.
  2 Comments
Matt J
Matt J on 9 Feb 2018
As a related point, notice how the "4D array with Loops" method improves when re-organized as follows
tic
for i=1:itt
C = zeros(ynum,xnum,ytessel,xtessel);
%Z=zeros(ynum,xnum,1,1);
for yt=1:ytessel
for xt=1:xtessel
TMP=0;%re-zero
for y=1:nnum
for x=1:nnum
dT = dT_depot{y,x};
TMP = TMP + dT.*totemit(y+(yt-1)*nnum,x+(xt-1)*nnum);
end
end
C(:,:,yt,xt) = TMP;
end
end
end
disp(sprintf('Time: %f (4D Array Method with Loops - Version 2)',toc));
The variable TMP eliminates the right-hand-side indexing in the accumulation step in a manner similar to your "independent variables" approach. The compute time reduction is,
Time: 6.088266 (4D Array Method with Loops)
Time: 1.769762 (4D Array Method with Loops - Version 2)
Christopher
Christopher on 9 Feb 2018
Wow, I appreciate this very much. Using the following code
dT = zeros(ynum*xnum,nnum*nnum);
for x=1:nnum
for y=1:nnum
dT(:,y+(x-1)*nnum) = rand(ynum*xnum,1);
end
end
tmp = mat2cell(totemit,ones(1,ytessel)*nnum, ones(1,xtessel)*nnum);
S = reshape(cell2mat(tmp(:).'),nnum*nnum,[]);
C = zeros(ynum*xnum,ytessel*xtessel);
tic
for i=1:itt
C = C+dT*S;
end
toc
Recalculating timings for ytessel=4, xtessel=4, itt=5, and nnum=72:
Time: 1.607546 (Matt J's Fully Vectorized)
Time: 14.911492 (Cell Array Loop Method)
Time: 8.121991 (Independent Variables Loop Method)
Time: 76.033369 (4D Array Loop Method)
Time: 15.636499 (Structure Arrays Loop Method)
Time: 124.471730 (Eval Loop Method)
and taking timings from before for ytessel=4, xtessel=2, itt=5, and nnum=144 and adding new timings:
Time: 325.490661 (Cell Array Loop Method)
Time: 493.344934 (4D Array Loop Method)
Time: 32.752160 (Independent Variables Loop Method)
Time: 319.870357 (Structure Arrays Loop Method)
Time: 12.203983 (Matt J's Fully Vectorized)
So here we are seeing about 2.5-5x speedups by full vectorization.

Sign in to comment.

More Answers (0)

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!