Slow computation time of parfor loop

3 views (last 30 days)
기옥 김
기옥 김 on 7 Sep 2022
Answered: Alvaro on 26 Jan 2023
Hello,
I need help to optimize the following parallel loop
parfor k=1:N
[Laux{k}, Uaux{k}, Paux{k}, Qaux{k}] = lu(Jtot{k})
end
The computation time of the above loop takes
Elapsed time is 3.569814 seconds.
Jtot contains sparse matrix of ~40k x 40k size in each cell.
I simply tried the following code
Jtot2=Jtot{1}
parfor k=1:N
[Laux, Uaux, Paux, Qaux] = lu(Jtot2)
end
Elapsed time is 0.749602 seconds.
,and then i also tried this one
Jtot2=Jtot{1}
parfor k=1:N
[Laux{k}, Uaux{k}, Paux{k}, Qaux{k}] = lu(Jtot2)
end
Elapsed time is 2.593602 seconds.
It seems like large size of Jtot, and resulted LU decompositions brings the issue.
I've also tried spmd but it was still slow.
spmd(N)
[Laux, Uaux, Paux, Qaux] = lu(Jtot{labindex})
end
The sequential matrix inversion process has to be followed after the parallel loop, so the results of each cell decomposition of Jtot need to be stored.
How can i reduce the computation time? i wish to decrease it not more than ~1sec.
Thank you in advance
  3 Comments
기옥 김
기옥 김 on 13 Sep 2022
Hello, Unfortunately, i believe it is not possible to provide whole code to run it on the othermachine because the code is too long.
for kiter=1....
parfor k=1:Num_pool
X_pp=X_local(:,k+1);
[J, res{k}] = Jc.eval2(X_pp, mat_fun);
Jtot = PT' * (J + Qconst{k}) * PT;
res_tot{k,1} = PT'*(res{k} + Qconst{k}*X_pp + FL{k});
[Laux{k}, Uaux{k}, Paux{k}, Qaux{k}] = lu(Jtot);
end
dX{1} = PT*(Qaux{1} * (Uaux{1} \ (Laux{1} \ (-Paux{1}*res_tot{1}))));
x_op=1% temporary
for k=2:Num_pool
dX{k}=PT*(Qaux{k} * (Uaux{k} \ (Laux{k} \ (Paux{k}*(-res_tot{k} + x_op*PT'*Mtot*dX{k-1}) ))));
X_local(:,k+1)= X_local(:,k+1)+x_op*dX{k};
end
end
I've tried with different approach and the above code is one of it.
This is code for finite element analysis. I'm trying to make parallelize loop.
X_pp is the unknonwn vector to be solved of which size is (N_pp,1)
Jc.eval() is the function to evaluate the jacobian matrix, J.
J is the sparse matrix (Jacobian) of (N_pp,N_pp). N_pp is around 40000.
The variables res, res_tot, and the results of LU decomposition is called after parloop so that i need to stored it as a cell.
As it can be seen, it solve X_local(:,[k Num_pool]) in parallel, where
k denotes for the time step. without this loop X_local is solved for step by step with increasing k. In that case,
mldive can be used instead of LU decomposition..
This parallel code is much more slower than i expected..
Alvaro
Alvaro on 26 Jan 2023
How long does this take to run in serial? At the moment it is not clear why you need a faster computing time than 1 second per parfor loop.

Sign in to comment.

Answers (1)

Alvaro
Alvaro on 26 Jan 2023
If you wish to parallelize, lu already has built-in support for running in thread-based environments.
Alternatively, you could consider slicing your matrix or working with distributed arrays.
Consider also the thresh parameter in lu which might decrease calculation time at the expense of accuracy.

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!