access time of data in cell array vs matrix

Hello
I am doing operations (denoising/detection etc) on images and it seems that it is faster to store them in a cell array rather than a matrix to access them. It comes as a surpise to me. I give an exemple right under doing convolutions.
What is the reason for this ?
% create data
N_img = 1000;
img = rand(512,512,N_img);
for ii=size(img,3):-1:1
img_cell{ii}=img(:,:,ii);
end
% comparison on convolutions
tic
out_cell = img_cell;
for ii=1:N_img
out_cell{ii} = imgaussfilt(img_cell{ii});
end
toc
% I get 1.6s-1.8s
clear out_cell
tic
out_mat = img; %allocation inside or out of the timer does not change the trend
for ii=1:N_img
out_mat(:,:,ii) = imgaussfilt(img(:,:,ii));
end
toc
clear out_mat
% I get ~3.2 s
Sincerely,

1 Comment

I would take the imgaussfilt out of there to see the difference in access times more transparently:
% create data
N_img = 1000;
img = rand(512,512,N_img);
img_cell=num2cell(img,[1,2]);
tic
out_cell = img_cell;
for ii=1:N_img
out_cell{ii} = img_cell{ii};
end
toc
Elapsed time is 0.015706 seconds.
tic
out_mat = img;
for ii=1:N_img
out_mat(:,:,ii) = img(:,:,ii);
end
toc
Elapsed time is 1.547411 seconds.

Sign in to comment.

 Accepted Answer

There is a useful command "format debug". Unfortunately it does not work in Livescript, so to run it here I have to use the hack of evalc
cmd = "format debug, A = rand(2,3), B = rand(2,3), C{1} = A, C{2} = B, T1 = C{1}, T2 = C{2}"
cmd = "format debug, A = rand(2,3), B = rand(2,3), C{1} = A, C{2} = B, T1 = C{1}, T2 = C{2}"
evalc(cmd)
ans =
' A = Structure address = 7febc99a6260 m = 2 n = 3 pr = 7febccd84400 0.8732 0.3611 0.2854 0.4290 0.5620 0.9112 B = Structure address = 7febc98cc8e0 m = 2 n = 3 pr = 7febce7bf6a0 0.9777 0.4152 0.7182 0.0407 0.0746 0.2219 C = 1×1 cell array {2×3 double} C = 1×2 cell array {2×3 double} {2×3 double} T1 = Structure address = 7febc9047dc0 m = 2 n = 3 pr = 7febccd84400 0.8732 0.3611 0.2854 0.4290 0.5620 0.9112 T2 = Structure address = 7febc9044700 m = 2 n = 3 pr = 7febce7bf6a0 0.9777 0.4152 0.7182 0.0407 0.0746 0.2219 '
Notice that the pr (data pointer) of A is the same as the pr of T1 -- so storing an array into a cell array keeps the same data pointer (no data copying), and retrieving it from the cell array keeps the same data pointer.
Therefore if you use a per-slice cell array, then retrieving each slice involves only creation of a temporary variable header without copying the data. But if you use (:,:,i) indexing of a 3D matrix then MATLAB needs to create a new data block of the appropriate size and copy the slice of the array into it.
Now, when you create a cell array, each cell array that has never had anything written into it uses an 8 byte pointer set to binary 0 and no further storage, and MATLAB knows to interpret that as "slot holds an empty double". If, however, you have ever assigned anything to the slot, then it needs the 8 byte pointer and another 96 bytes of header information describing the size and class of the data, and then the actual block of memory. So for each used slot of a cell array, there is 104 bytes of overhead beyond the data storage.
Therefore storing a 3D array as a block takes 104 bytes plus storage for the total number of elements in the array, whereas storign the same array as a cell array per slice takes 104 bytes for the cell, plus 104 bytes times the number of slices, plus storage for the totla number of elements in the array.
(Although, in some cases if the amount of memory per slice is small enough, each slice might end up stored inside a fixed-sized block... wasting the memory between the end of the used area and the end of the fixed-sized block.)
Remember too that if you do store in a cell array per-slice then there is overhead time in setting up all of those slices. Your measurement code is only timing retrieval not creation.
I have seen some hints that the last couple of releases, MATLAB has had a way to create references to sections inside an array instead of having to copy the data block. However the hints have not been clear enough for me to have a grasp of how such references are created or maintained, or what their restrictions are... for example maybe they can only exist for data slices that happen to use all memory within particular offsets, such as a single 2D slice of a 3D array.

9 Comments

I had no idea that extracting a slice of a matrix with (:,:,i) when passing an argument woud generate a copy of the slice. That seems pretty inefficient and I do it all the time when vectorised operations on the image are not possible.
Is there a way to do better than this ? I am under the impression that it should be possible to avoid this copy. I guess that there is no documentation on the way to force use of ref/pr to slices of 3D arays since you mention only hints.
Regarding the time for creation it seems that is faster to create cell (to my suprise again !)
% create data
N_img = 2000;
img = rand(512,512,N_img);
% time for matrix allocation & filling
tic
new_img = img+1;
toc %3.2s
tic
clear new_img
toc % 0.9s
% time for cell allocation & filling
tic
new_img_cell = cell(N_img,1);
for ii=1:N_img
new_img_cell{ii} = img(:,:,ii)+1;
end
toc %2.7s
tic
clear new_img_cell
toc %0.7s
Evidence that copies of slices are made instead of internal pointers:
If referencing a continguous slice just pointed into the existing array, then the pr of B and C would be the same.
cmd = "format debug; A = rand(2,3,7); B = A(:,:,3), C = A(:,:,3)"
cmd = "format debug; A = rand(2,3,7); B = A(:,:,3), C = A(:,:,3)"
evalc(cmd)
ans =
' B = Structure address = 7f022715ef80 m = 2 n = 3 pr = 7fb5e04e72e0 0.5829 0.9179 0.0284 0.0163 0.2152 0.4689 C = Structure address = 7f022715d2a0 m = 2 n = 3 pr = 7fb5d5e4ac20 0.5829 0.9179 0.0284 0.0163 0.2152 0.4689 '
Matt J
Matt J on 20 Oct 2023
Edited: Matt J on 20 Oct 2023
Is there a way to do better than this ? I am under the impression that it should be possible to avoid this copy.
You've already discovered how to avoid it: use cell arrays.
I guess that there is no documentation on the way to force use of ref/pr to slices of 3D arays since you mention only hints.
This can't be done in general because depending on the indexing, the slice pixels will not be contiguous in memory, e.g. with img(:,i,:) or img(:,:,1:2:end). The LAPACK libraries that Matlab uses for numeric array processing require contiguity, I believe.
I have seen some hints that the last couple of releases, MATLAB has had a way to create references to sections inside an array instead of having to copy the data block
Doubtful. I've asked TMW people about it on multiple occasions and none of the responses I get are consistent with that. If there were a way to access slices by reference, I don't see any reason why the vectorized operation below would still be beaten by the for-loop, even in R2023b.
N=1e7;
x=rand(1,N);
tic;
for i=2:N
x(i)./x(i-1);
end
toc
Elapsed time is 0.049742 seconds.
tic;
x(2:end)./x(1:end-1);
toc
Elapsed time is 0.103975 seconds.
New internal facilities might not be in general use. For example passing a slice reference might hypothetically only currently be implemented for calls into LAPACK, or something like that.
Unfortunately I do not recall the context in which I read about new facility. If I recall correctly, it was a context that mentioned that they existed, but did not explicitly use them in that context. The sort of mention you might find in a comment in an internal class describing a design decision for a block of code where you can't see any difference in the calls made for the two possibilities and so are left baffled about what is going on.
"... The LAPACK libraries that Matlab uses for numeric array processing require contiguity, I believe. ..."
For BLAS/LAPACK matrix functions, yes this is true. You can sort-of use different strides in the matrix functions by specifying a different number of physical rows vs the number of rows you are using, and there are some vector functions that can use arbitrary strides (e.g., dot product). But the input data still must be contiguous.
I have an FEX submission that could create variables that were contiguous slices of other variables by reference, but it only works with older versions of MATLAB. Sadly, I am unable to update this submission because TMW has removed the shared data copy linked list information from their mxArray headers.
The main drawback is that the original memory block must remain allocated during the entire time the slice variables exist. My FEX submission took care of this automatically off to the side by storing a shared data copy of the source in a persistent hidden variable, and deleting it automatically when all slices are deleted.
If I could do this, I'm sure TMW could do the same if they wanted to. That being said, TMW has introduced many page functions somewhat recently (pagemtimes, pagemldivide, etc.) that alleviate the need for this capability.
@Matt J Indeed cells are a good option when you only perform non vectorized operations on frames. However in my use case, some operations on the movie can be performed on the matrix movie as whole (operations like normalisation : img=rand(512,512,1e3); img=img./mean(img,[1,2]); ).
So either I use cells, and lose the benefit of the fast matrix calculations, or I use matrix and lose time on data copies when I am not able to vectorised on the third dimension. I will need to see which one is the most advantageous in my case.
Actually @James Tursa FEX function is exaclty what I need but our computers are on Matalb 2021 to 2023.
Thank you for the informations, I think I have a better idea of where I stand.
So either I use cells, and lose the benefit of the fast matrix calculations
It can be hard to tell when this will really matter, e.g.,
img=rand(512,512,1e3);
tic;
img=img./mean(img,[1,2]);
toc
Elapsed time is 0.242101 seconds.
img=num2cell(img,[1,2]);
tic;
for i=1:numel(img)
img{i}=img{i}./mean(img{i}(:));
end
toc
Elapsed time is 0.213300 seconds.
Regarding you example it looks like in need to updage my matlab. On matlab 2020a I go from 0.1s with matrix to 2.5s with cells running exactly your code.
And on the third dimensions with 1e4 images on matlab 2020a I go from 1s to 22s using the following code (reduced to 1e3 images to run here).
img=rand(512,512,1e3);
tic;
img=img-mean(img,3);
toc
Elapsed time is 0.460275 seconds.
img=num2cell(img,[1,2]);
tic;
mean_im = zeros(size(img{1}));
for i=1:numel(img)
mean_im=mean_im+img{i};
end
mean_im = mean_im/length(img);
for i=1:numel(img)
img{i}=img{i}-mean_im;
end
toc
Elapsed time is 0.326576 seconds.
Edit : upgraded to 2023b and those 2 examples are still 20x faster vectorised than with cells on my pc - unlike when executing here in a navigator.

Sign in to comment.

More Answers (1)

Matt J
Matt J on 19 Oct 2023
Edited: Matt J on 19 Oct 2023
Because extracting from cell arrays involves no new memory allocation. And because N_img=1000 is still very small.

8 Comments

"Because extracting from cell arrays involves no new memory allocation."
Matt, could you provide a reference for this?
I wouldn't know where to find one at this moment, but it's easy to test with the code below. As you can see, accessing the cell results in no drop in available memory.
%%Array Access
clear
A=rand(512,512,512);
memory().MemAvailableAllArrays/2^30
B=A(:,:,:);
memory().MemAvailableAllArrays/2^30
ans =
16.7951
ans =
15.7946
%%Cell access
clear
C={rand(512,512,512)};
memory().MemAvailableAllArrays/2^30
D=C{1};
memory().MemAvailableAllArrays/2^30
ans =
16.7966
ans =
16.7966
I think in your example D is a reference to C{1} and therefore there are no memory allocation.
With the following code you see that changing an element
%%Array Access
clear
A=rand(512,512,512);
memory().MemAvailableAllArrays/2^30
B=A(:,:,:);
memory().MemAvailableAllArrays/2^30
B(1)=1;
memory().MemAvailableAllArrays/2^30
ans =
13.0929
ans =
12.0909
ans =
12.0909
%%Cell access
clear
C={rand(512,512,512)};
memory().MemAvailableAllArrays/2^30
D=C{1};
memory().MemAvailableAllArrays/2^30
D(1)=1;
memory().MemAvailableAllArrays/2^30
ans =
13.0911
ans =
13.0911
ans =
12.0884
That's why I was adding a convolution, to be sure there would be no reference trick.
And it represent my use case : I get a frame, work on it and store it back in my variable.
Matt J
Matt J on 19 Oct 2023
Edited: Matt J on 19 Oct 2023
@Matt Yes, but allocating memory for processing output is not what differentiates the two implementations. Obviously, memory needs to be allocated to hold the output of the processing in both cases. However, you should be able to see from my test that, only in the case where you access with img(:,:,1) is new memory allocated for the input as well. A new block of memory is created to hold a copy of img(:,:,1), even before imgaussfilt or any other processing step is done.
As an additional test, look at the following which also shows memory consumption as tracked over time with the Windows Task Manager. With cell arrays, the memory spike is half the size.
clear
A=rand(512,512,512*3);
B=A(:,:,:);
sum(B(:));
clear
%%Cell access
clear
C={rand(512,512,512*3)};
D=C{1};
sum(D(:))
clear
oh ok !
Very impressive example, and indeed it explains why my usage of matrix is relatively slow.
@Dyuman Joshi "Matt, could you provide a reference for this?"
I doubt there is an official reference for this, but that is how it has always worked for cell and struct and property extraction. You get shallow copies (shared data or reference), not deep copies.
This is an application of copy-on-write which is a fundamental MATLAB memory mechanism.

Sign in to comment.

Categories

Products

Release

R2022b

Tags

Asked:

on 19 Oct 2023

Edited:

on 23 Oct 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!