Sparse Matrix build efficiency

5 views (last 30 days)
Hello everyone.
I want to build a nxn matrix A as sparse for efficiency( time of the inversion(A\)). i)I would like to ask is it ok if I define it with the command A=sparse([..],[..],[..],n,n) with one or a few non zero elements at the beginning and then add the rest of the elements with commands like the following: A(..:..,..:..)=..; or will it cause a problem with time efficiency like when we change the size of a regular matrix or vector inside a loop and we get the warning "consider preallocating for speed"?
I can't directly compare it in terms of time efficiency with the case of building the three necessary vectors [..] from the beginning and use only the command sparse, because I have to change many things in my code to do that.
ii) Is it nessesary or possible in MATLAB to store a matrix in skyline format along with the corresponding vector of the positions of the diagonal elements?
Thank you in advance

Accepted Answer

John D'Errico
John D'Errico on 7 Jun 2017
Edited: John D'Errico on 7 Jun 2017
In general, DON'T DO IT! That is, don't build a sparse matrix one element at a time.
Even if you use spalloc, the matrix will be inefficient to build.
For example, I'll build a sparse matrix one element at a time below.
N = 100000;
M = spalloc(1000,1000,N);
tic
k = N + 1;
for i = 1:N
k = k - 1;
M(k) = i;
end
toc
Elapsed time is 6.069033 seconds.
So I stuffed the matrix to its allocated capacity, at 10% non-zeros. I did it by inserting elements from the back end first, so every time I put a new element in, I believe the sparse insert forces the elements to be re-shuffled in memory.
Next, I'll stuff them in the same order they will actually be stored in memory. So no reshuffling is necessary.
N = 100000;
M = spalloc(1000,1000,N);
tic
k = 0;
for i = 1:N
k = k + 1;
M(k) = i;
end
toc
Elapsed time is 0.196214 seconds.
This time it was MUCH faster. The problem is, if you do the stuffing in random order, it will almost always be true that SOME amount of reshuffling must be done. So most of the time, the time will be like the first call.
N = 100000;
ind = randperm(1e6,N);
M = spalloc(1000,1000,N);
tic
for i = 1:N
k = ind(i);
M(k) = i;
end
toc
Elapsed time is 2.980270 seconds.
So it took 6 seconds for the worst case. and half that for the random sequence of elements. But if I built it in the best possible order, it took 0.2 seconds.
Again, the point is, DON'T DO IT!
Instead, build the array using sparse, in ONE single call. This will force you to create a list of non-zero elements upfront. Preallocate your matrices. Never grow them dynamically.
tic
[rind,cind] = ind2sub([1000,1000],ind);
M = sparse(rind,cind,1:N,1000,1000);
toc
Elapsed time is 0.036842 seconds.
You should notice that this call took significantly less time than the best possible case to allocate that sparse matrix before (using a loop).
If you are going to use sparse matrices, then learn to use the sparse matrix tools. These tools are of huge benefit, but if you use them poorly, you will still get poor performance.
Ok, yes, you THINK you cannot use sparse. In fact, you are simply wrong in that. You can build the elements in advance, storing them as a list of row and column indices, plus a list of values. If you know in advance how many elements you will have, just preallocate those vectors as FULL vectors, stuffing them one at a time as you compute the elements. When the last element is created, call sparse, ONCE. ONE TIME.
If you don't know how many elements you will have at the end, I provide code to build arrays incrementally here, found on the file exchange for download.
  1 Comment
acivil acivil
acivil acivil on 8 Jun 2017
Edited: acivil acivil on 8 Jun 2017
John,
Thank you very much for your answer. You provided a very good example of the problem that I was referring to. I think I will try to allocate the three necessary vectors from the beginning but the link you provided is very useful as well.
Thank you very much.

Sign in to comment.

More Answers (2)

Steven Lord
Steven Lord on 7 Jun 2017
Inverting a matrix is generally not a good idea, and it's especially not recommended with a sparse matrix.
>> rng default
>> A = sprand(100, 100, 0.1);
>> invA = inv(A);
>> nnz(A)
ans =
951
>> nnz(invA)
ans =
10000
While invA is stored in the sparse storage format, it's not actually sparse; all its elements are nonzero.
If you're trying to invert the matrix to solve a system of equations A*x = b use the backslash operator x = A\b or, if your matrix is REALLY large, try one of the functions in the "Iterative Methods and Preconditioners" section of this documentation page.
Getting back to your question about efficiently creating a sparse matrix, if you must create it with a few elements to start then add in additional elements iteratively I would use the spalloc function first.
  1 Comment
acivil acivil
acivil acivil on 8 Jun 2017
Edited: acivil acivil on 8 Jun 2017
Steven,
Thank you very much for your response. I came across the link that you provided when I was searching for possible ways to improve the efficiency of the system solution. However I haven't studied them yet because they are not the main field of my research, despite their importance. I think it is now time to find out how to use them in my code. Do you know where I can find correct examples of calling the over mentioned methods? To be more specific, for example if I use " Sparse reverse Cuthill-McKee ordering" then do I have to reorder the vector b as well, before solving A\b (and other details like this one).
I also wanted to comment that in my question I used the wrong term "inversion" to refer to the backslash operator. I always use A\b and never A^(-1).
Thanks again.

Sign in to comment.


Christine Tobler
Christine Tobler on 7 Jun 2017
i) You should expect constructing a sparse matrix in a loop by adding elements using indexing A(i, j) = s to be slower than constructing the three vectors i, j, s for the whole matrix, and then calling sparse once. But before making a large change to your code, you may want to profile your code, to check whether the construction of the matrix is the bottleneck.
ii) It's not possible or necessary in MATLAB to represent a matrix in the skyline format. If you are using backslash (x = A \ b), this will detect if a sparse matrix A has a banded structure.
Also, as Steve mentioned, you should definitely not invert a sparse matrix - the result will be nearly dense. To solve a linear system, use backslash instead.

Categories

Find more on Resizing and Reshaping Matrices in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!