# Without resorting to FEX/MEX, can the overhead of repeated calls to sparse() be avoided with operations on I, J, V directly?

3 views (last 30 days)

Show older comments

I have a time-marching problem bottlenecked by matrix multiplication , where the sparse matrix is the sum of many time-independent sparse matrices with time-dependent scalar coefficients:

I'm aware of FEX solutions that would enable a 3D to be defined once prior to time-marching, though I would prefer to rule out native MATLAB first. My current thinking is to supplement the I, J, V vectors with an index vector K, and do one of the following:

- Generate a cell array of sparse matrices along index K, with multiplied with bsxfun, similar to this. (Probably the slowest option)
- Sparse matrix construction with B=sparse(I, J, V.*beta(K), N, N)*X
- Avoid sparse matrix entirely with B=accumarray(I, V.*beta(K).*X(J), N, N)

I haven't tested option 1, but option 3 with accumarray runs about 50x slower than option 2 (with or without issparse option). This makes sense, as the multiplication with X in accumarray is done before accumulation, hence on a much larger array.

However, option 2 still appears to be wasting performance. For example, sparse(I, J, V, N, N)*X runs about 50x slower than basic A*X. My initial interpretation of this is that much of the time in sparse is spent on unavoidable summation of common terms in I,J,V. I cant profile sparse internally, but tested this by accumulating V prior to sparse(I, J, V, N, N)*X, which comes out to ~7x slower than A*X.

My conclusion is that there is still some significant & (potentially) avoidable overhead from calling sparse many times unrelated to summation of duplicate indices, and the larger piece of summation that is unavoidable due to the summation of time-dependent terms co-located in .

To boil this down, I have two questions I cannot answer:

- Considering the pattern of summation of duplicate indices is always the same (static I/J), can the summation overhead in sparse be improved?
- What is the source of the remaining overhead in sparse and can it be avoided?

##### 20 Comments

Bruno Luong
on 7 Aug 2022

@James Tursa yes you get it right. However

- In the range of density/size and N_k (m in your notation) your method is still 50% slower than the accumarray solution we come up with,
- The runtime scale with N_k and is not bounded by the assumption of limited "bandwidth" (density of A0) stated by @Joel Lynch. It is probably an important requirement for his application.

### Accepted Answer

Bruno Luong
on 7 Aug 2022

Edited: Bruno Luong
on 7 Aug 2022

EDIT: Here is the final code. Method used in B2 is the fatest;

M = 10000;

N = 10000;

N_k = 100;

A_k = cell(1,N_k);

for k=1:N_k

A_k{k} = sprand(M,N,0.0005);

end

N_k = length(A_k);

[M,N] = size(A_k{1});

beta = rand(N_k,1);

X = rand(N,1);

I = cell(1,N_k);

J = cell(1,N_k);

V = cell(1,N_k);

K = cell(1,N_k);

for k=1:N_k

[I{k},J{k},V{k}] = find(A_k{k});

K{k} = k + zeros(size(I{k}));

end

I = cat(1,I{:});

J = cat(1,J{:});

K = cat(1,K{:});

V = cat(1,V{:});

[IJ,~,ic] = unique([I,J],'rows');

I2 = IJ(:,1);

J2 = IJ(:,2);

Y = sparse(ic, K, V, size(IJ,1), N_k);

tic

B1=sparse(I, J, V.*beta(K), M, N)*X ;

toc

tic

B3 = sparse(I2,J2,Y*beta,M,N)*X;

toc

tic

B2 = accumarray(I2,(Y*beta).*X(J2),[M,1]);

toc

##### 5 Comments

Bruno Luong
on 7 Aug 2022

Edited: Bruno Luong
on 7 Aug 2022

### More Answers (0)

### See Also

### Categories

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!