Sparse matrix memory understanding

5 views (last 30 days)
Lazar
Lazar on 30 Oct 2024
Edited: John D'Errico on 30 Oct 2024
I'm currently working with large matrices (significantly over 20000x20000) with most zero elements. Usually, around 5% of the elements are non-zero, so it would make sense to use sparse programming. Before sparsifying the computations, I wanted to test some basic concepts. I generated a random 20000x20000 matrix and checked its memory. Afterwards, I made most of its elements zero so that less than 5% non-zero elements remain. After checking its memory, I found out that the sparse matrix takes up even more memory. A screenshot of the code can be found below. Can someone shed light on this?

Accepted Answer

Steven Lord
Steven Lord on 30 Oct 2024
Please don't post pictures of code. MATLAB Answers can't run pictures of code.
rng default
Yb = unifrnd(-1, 1, [20000 20000]);
fprintf('%g%% of Yb is non-zero.', 100*nnz(Yb)/numel(Yb))
100% of Yb is non-zero.
Yb2 = Yb;
Yb2(Yb2<0.9) = 0;
fprintf('%g%% of Yb2 is non-zero.', 100*nnz(Yb2)/numel(Yb2))
4.99689% of Yb2 is non-zero.
Ybs = sparse(Yb2);
fprintf('%g%% of Ybs is non-zero.', 100*nnz(Ybs)/numel(Ybs))
4.99689% of Ybs is non-zero.
Let's check and make sure that what you're saying is correct, that Ybs takes up more memory than Yb or Yb2.
whos Yb Yb2 Ybs
Name Size Bytes Class Attributes Yb 20000x20000 3200000000 double Yb2 20000x20000 3200000000 double Ybs 20000x20000 319961112 double sparse
In your separate whos calls the bytes column was not aligned. You can see that Ybs does in fact take up much less memory than either Yb or Yb2.

More Answers (2)

Walter Roberson
Walter Roberson on 30 Oct 2024
Sparse memory storage starts with a vector of pointers, with one element per column. The pointer content is 0 (NULL pointer) if the column has no non-zero entries.
For the columns that have non-zero entries, the above pointer points to a block of memory that is a paired list. The initial element of the pair is a row number. The second element of the pair is the value at the indicated row.
You are populating 20000 x 20000 randomly with 5% fill. The probability that at least one row in any given column will be selected is as close to 1 as to make no difference (about 1 minus 3e-446). So the marginal column vector pointer is going to be completely full, occupying 20000 * sizeof(pointer) = 20000*16 = 320000 bytes.
About 1 in 20 columns will be full, so approximately 1000 slots will need to be allocated per column. Each slot needs a 48 bit offset (row numbers can be up to 2^48-1) and an 8 byte value, 14 bytes total. That is provided that the offsets are stored as 48 bit instead of as 64 bit. So you should expect roughly 1000*14*20000 = 280000000 or 1000*16*20000 = 32000000 for the column values. Total 280320000 bytes or 32320000 bytes.
Comparing to the actual memory used, it appears that MATLAB uses the full 8 bytes for the offsets, and in practice was able to save a bit on the marginal vectors.

John D'Errico
John D'Errico on 30 Oct 2024
Edited: John D'Errico on 30 Oct 2024
I think you missed a zero.
By the way, 5% sparse is NOT at all what I would consider sparse, or for there to often be a gain. Linear algebra wants a matrix to be REALLY sparse, say 5 or 10 elements per row. Then you will see it fly!
And if all you are doing is using that matrix to store data, well, there may be better ways. Inserting new elements into a sparse matrix will be slow.
Now, as to your claim of size.
Afull = unifrnd(-1,1,[20000,20000]);
Afull(Afull < 0.9) = 0;
Asparse = sparse(Afull);
whos Afull Asparse
Name Size Bytes Class Attributes Afull 20000x20000 3200000000 double Asparse 20000x20000 320134552 double sparse
nnz(Asparse)/numel(Asparse)
ans = 0.0500
So Afull takes up 10 times as much space, but is only 5% full in either case.
The sparse matrix storage has an inefficiency, due to the need to store pointers to the non-zero elements. But it is still 10 times less memory. NOT the same amount of memory.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!