An alternative to cell array

16 views (last 30 days)
Daniel Gourion
Daniel Gourion on 8 Dec 2022
Commented: Stephen23 on 8 Dec 2022
I know that variables should not be named dynamically, so I have developped another solution for the following task:
I wrote a function (function M = structure(A,B,C,D)) which computes an array M of 6 x 8 x p integers, from arrays A, B, C, D of 6 x 8 x p1 integers (resp 6 x 8 x p2,...) .
I need to call this function a lot of time (about a thousand) with different variables A, B, C , D (previously computed the same way), which are easy to index but may be very large (p1, p2 can be as large as 10^7).
I tried to use cell arrays and it worked for some small instances, but I am afraid storing an important number of growing arrays in the same cell will make the program unefficient for larger instances. Is there a more promising way to handle this problem? At first, I ran the program manually for each value of (A,B,C,D) but there are hundreds of such quadruplets and it is quite boring and inefficient.
  4 Comments
Daniel Gourion
Daniel Gourion on 8 Dec 2022
Edited: Daniel Gourion on 8 Dec 2022
Thanks for your answer!
My RAM is 128 Go and my hard disk 1To
The size of the set of variables computed so far, running the program manually for each variable, is approximately 12 Go, corresponding to hundreds of int8 variables of size 6 x 8 x p with p ranging from 1 to 95 e+6.
I have not tried yet to load all these variables at the same time in Matlab.
As described in my post, I am currently trying a solution with cell to go further, but it has not yet recalculated the same number of arrays I have done manually previously (it will take a few days).
@Bora: thanks, saving one variable per file seems a reasonable option. Doing so, I guess that the advatage is that the names of the files would be indexed but not the names of the variables, is it correct?
Stephen23
Stephen23 on 8 Dec 2022
"I guess that the advatage is that the names of the files would be indexed but not the names of the variables, is it correct?"
Yes. You should name the files sequentially (or after test cases, or whatever make sense for your data), but keep the variable names exactly the same in each file. Note that to make your code robust, you should LOAD into an output variable (which is a scalar structure) and access its fields:
S = load(..)

Sign in to comment.

Answers (1)

Steven Lord
Steven Lord on 8 Dec 2022
You say that you need to call this function many times, but do you need the M output from each of those calls to exist in memory simultaneously? If so and p is as large as you say it is, I'm not sure you'll be able to find a machine with that much memory. Let's look at how large one of your M arrays is, assuming it's stored as an 8-bit integer (1 byte per element) and that p is on the order of 1e7.
bytes = 6*8*1e7;
gb = bytes/(1024^3)
gb = 0.4470
How much space do you need for all of them?
mem = 1000*gb
mem = 447.0348
Roughly speaking half a terabyte in half a gigabyte contiguous chunks. That's just considering the M arrays and assuming they're 8-bit integers; add in A, B, C, and D, any temporary arrays you need to create inside your function, or make M a double array (8 bytes per element) and your task doesn't seem feasible on one machine.
In that case you're probably going to need to make use of the Big Data functionality in MATLAB and/or the parallel computing capabilities of Parallel Computing Toolbox.
One assumption I've made in this post is that p is on the order of 1e7. You said that p1 and p2 were, but not p. If p is much smaller and all the M arrays are the same size, you may be able to use a 4-dimensional array.
M = zeros(2, 3, 4, 5);
A = reshape(1:24, [2 3 4]);
for k = 1:5
M(:, :, :, k) = A*2^k;
end
Now let's spot check: we should expect M(:, :, :, 3) to be 2^3 = 8 times A. Is it?
M(:, :, :, 3)./A
ans =
ans(:,:,1) = 8 8 8 8 8 8 ans(:,:,2) = 8 8 8 8 8 8 ans(:,:,3) = 8 8 8 8 8 8 ans(:,:,4) = 8 8 8 8 8 8
  1 Comment
Daniel Gourion
Daniel Gourion on 8 Dec 2022
Edited: Daniel Gourion on 8 Dec 2022
Thanks a lot for your answer.
"You say that you need to call this function many times, but do you need the M output from each of those calls to exist in memory simultaneously?"
No, I just need 4 values of those M (named A, B, C, D in my first post) to exist in memory in order to compute a new M. Later in the process M will be A (or B, C, D) for computing some other M.
For the moment the size of M is much of the time greater than the size of A, B, C, D,but at the end of the process, the size of M will be smaller than the size of A, B, C, D. I dont know exactly the maximal size M should attain, but I evaluate between 1e8 and 1e9.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!