Help me plan performance critical code: using cell array versus matrices

20 views (last 30 days)
Hello,
this is a general question about the internals of Matlab performance, so it is by its nature a bit vague but nevertheless I've seen great discussions in the community so I hope to learn something.
The most condensed version of the question is: What ist the best way performance-wise to store matrices whose size and the number of them is not known in advance?
The background:
I am writing a program, where I will have a function that spits out a 3 dimensional matrix with some small dimenions, say
for ii = 1:N
B_i = function_create_B_i; % where B_i has dimensions (k_i,k_i,p) with k_i say equal to 10 and p=5 or 6 at most
end
However, I will have N number of these matrices, where N is chosen by the user, hence i cannot manually define all the B_1, B_2, ... B_N beforehand. N will not be big, maybe 10-15 at most.
Then, I need to take these matrices and make large matrix H that will have some special structure. For example, a block of it of it would look like this
A0 = [eye(k_1), -B_1(:,:,1).*wDE(2), -B_1(:,:,1).*wDE(3);...
-B_2(:,:,1).*wFR(1), eye(k_2), -B_2(:,1:2,1).*wFR(3);...
-B_3(:,:,1).*wIT(1), -B_3(:,1:2,1).*wIT(2), eye(k_3)...
];
I will need to go back and forth a lot and when you consider all the things going on, I want to have the most performance optimized code, as any savings will be magnified.
The problem:
I am trying to plan what is the best way to store the B_i matrices to populate the large matrix (of which A0 is just one block). If all the B_i had the same number of elements, i.e.k_1 = k_2 = ... = k, the fastet way would be to use a pre-allocate a large matrix, say BETA(k,k,p, N) and just index the fourth dimension to populate BETA.
%% Code with matrices
% if B_i has dimensions k,k,p for all i
BETA = zeros(k,k,p,N) % in the beginning
% then later as part of another loop
for ii= 1:N
BETA(:,:,:,ii) = function_create_B_i; % where B_i has dimensions (k_i,k_i,p) with k_i say equal to 10 and p=5 or 6 at most
end
However, I want to write the code so that k_1 = k_2 = ... = k is not a requirement. (Alternatively, in this case, B_i does not need to be 3D, it can also be 2d with all the matrices next to each other and then BETA can be 3D instead of 4D or even 2D instead of 4D).
Potential solution
Naturally, my next idea is to use cells. Have a cell BETA= cell(N,1) and store B_i. This is easy, but is this good performance-wise? Does it matter for the JIT compiler? In other programming languages using containers that can store everything is typically a performance bottleneck. Is this also the case for Matlab?
%% Code with cells
% if B_i has dimensions k_i,k_i,p and k_i ~= k_j for i~=j
BETA = cell(N,1)
for ii= 1:N
BETA{ii,1} = function_create_B_i; % where B_i has dimensions (k_i,k_i,p) with k_i say equal to 10 and p=5 or 6 at most
end
Note that I will do this over and over and over - overwriting the cells with new B_i, then accessing the elements to make the large matrix of which A0 is just a block, doing some calculation and then feeding that calcluation in the function_create_B_i. So the point is more about the internals of the compiler. For example, would in Matlab matter if instead of the above code with cells I try changing the values in the matrices in the cells directly?
%% Code with cells but accessing the array within
% if B_i has dimensions k_i,k_i,p and k_i ~= k_j for i~=j
BETA = cell(N,1)
for ii= 1:N
BETA{ii,1}(:,:,:) = function_create_B_i; % where B_i has dimensions (k_i,k_i,p) with k_i say equal to 10 and p=5 or 6 at most
end
Overall, is there a better way to go about it or is this the best that can be done in terms of flexibility vs performance? I was thinking also of getting the largest of the B_i matrices and make everything its size and then try to keep track of which elements to take
% for example if
B_1 = [2 2; B_2 = [2 3 2;
1 3;] 1 1 1;
3 1 0]
% I can have
B_1 = [2 2 0; B_2 = [2 3 2;
1 3 0; 1 1 1;
0 0 0] 3 1 0]
% and just access B_1(1:k_1,1:k_1)
but this will make the code messy and hard to read and I am not sure if it is worth the effort.
  2 Comments
Rik
Rik on 23 Jun 2023
In my experience flexibility always comes with a performance penalty. That is even true for your implicit choice: if you had picked C or assembly instead of Matlab to write your program, it would be faster (probably).
Structs/tables can be inefficient when accessing data, but cells should provide a nice middle ground between normal arrays and expensive containers.
If you really worry about performance, you could also consider writing a generator function that will create code for all reasonable numbers of B-arrays. You could still add a cell-based version to cover all use cases, but if the resulting code isn't extremely long, that could help with performance.
Boris Blagov
Boris Blagov on 23 Jun 2023
Yes, I've ran into this problem before and I did the cell trick and it was slower. I could actually write the use cases back then, I was just lazy and end up paying the price later on. Here the use cases are way too much and it is not possible at all.
I have never looked at generator functions, I will have a look, thank you for the hint.

Sign in to comment.

Answers (1)

Dheeraj
Dheeraj on 8 Sep 2023
Hi,
I understand that you’re trying to find what the best way performance wise to store Matrices given that we don't have prior knowledge of the size or quantity of matrices required.
When you have matrices of varying sizes and don't know the number of them in advance, using cell arrays in MATLAB is a reasonable and efficient approach. MATLAB is efficient at optimizing operations with cell arrays, and the performance hit is usually minimal compared to other languages where containers might introduce overhead.
The code block of yours implementing cell arrays, i.e.
%% Code with cells
% if B_i has dimensions k_i,k_i,p and k_i ~= k_j for i~=j
BETA = cell(N,1)
for ii= 1:N
BETA{ii,1} = function_create_B_i; % where B_i has dimensions (k_i,k_i,p) with k_i say equal to 10 and p=5 or 6 at most
end
This is a clean and maintainable way to handle matrices of varying sizes. It's also a good choice if you need to work with these matrices in a dynamic and flexible manner. You can easily access and manipulate individual matrices stored in the cell array.
Directly accessing the elements within the cell array as you mentioned:
%% Code with cells but accessing the array within
% if B_i has dimensions k_i,k_i,p and k_i ~= k_j for i~=j
BETA = cell(N,1)
for ii= 1:N
BETA{ii,1}(:,:,:) = function_create_B_i; % where B_i has dimensions (k_i,k_i,p) with k_i say equal to 10 and p=5 or 6 at most
end
The above code should also work fine and might have a slight performance advantage in terms of memory access because you avoid another level of indexing. However, the performance gain from this might not be significant in practice, and the code may become less readable.
Regarding resizing matrices to a common size can make code less readable and potentially more complex, it might be worth considering if the performance gain from working with equally sized matrices is substantial and justifies the extra complexity.
In summary, using cell arrays as you initially suggested is a good choice for handling matrices of varying sizes when you don't know their number in advance.
You can refer to the below MATLAB’s documentation to have a better understanding about cell arrays and its capabilities.
Hope this helps!

Products


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!