Distributed array to solve a system of linear equations on a cluster
9 views (last 30 days)
Show older comments
I'm trying to solve a system of linear equations in parallel on a computing cluster using iterative methods and distributed arrays. Right now my code looks like:
cores = 42;
cluster = parcluster;
parpool(cluster,cores);
K_solve_dist = distributed(K_solve);
force_vec_solv_dist = distributed(force_vec_solv);
[res_disp, flag_solv{ii,kk}(n,1)] = cgs(K_solve_dist,force_vec_solv_dist,tol_iter,max_iter);
However, regardless of how many cores I use, the run time seems to stay they same (this runtime is also the same as if I don't use distributed arrays at all). If I run it without the line "parpool(cluster,cores)" it runs almost 50% faster, but only uses 12 cores, even though there are more cores available. I'm trying to figure out if there's a way to use more than 12 cores and speed up the time it takes to preform this calculation.
0 Comments
Answers (2)
Sam Marshalik
on 7 Dec 2021
Hey Melissa,
I would not think of distributed arrays as a way to speed up your computation. Distributed arrays are useful for when something does not fit into your machine's memory, so you spread the content of the matrix across multiple machines. This will not cause the code to run faster and in fact, like you saw, will probably run it slower, since you are introducing the overhead of communication into the equation.
It is worth pointing out that using distributed arrays on a single machine will not give you any benefit, since you are still limited to that one computer's hardware. If you have access to MATLAB Parallel Server on your cluster, then using distributed arrays with your computation will be helpful.
In short, you will truly see the benefit of using distributed arrays when you are working with very large data that can't fit on one machine. If you want to try to speed things up, you will want to take a look at things such as parfor, parfeval, gpuArray, and such parallel constructs.
2 Comments
Joss Knight
on 10 Dec 2021
gpuArray supports all the iterative solvers including cgs. However, it is mainly optimized for sparse matrices. If your matrix is dense you'll be better off using a direct solver (i.e. mldivide). This is of course also supported by gpuArray.
Eric Machorro
on 11 Jan 2023
Piggy-backing on this question:
Setting aside the speed-up factor momentarily, How can I use CGS (or almost any Kylov type solver for that matter) with very big/long vectors which I cant hold all in memory? There are four variants to the problem
- I have a sparse matrix
- I have a symmetric, sparse matric (think Cholesky factor)
- I have a function handle that serves as the matrix-operator
- (revisting the speed up issue) I'd like to use this also in conjunction with non-GPU parallelization. Is this possible?
Does anyone have advice on any one of these?
Respectfully,
1 Comment
Oli Tissot
on 12 Jan 2023
All the Krylov methods are supported for distributed arrays, so 1. and 3. are supported just as MATLAB does ; there is also the possibility to implement your own preconditioner through a function handle as well. However 2. is not supported as-this because there is no built-in notion of "symmetric matrix" in MATLAB and for MATLAB a Cholesky factor is not a symmetric matrix but a triangular matrix -- basically there is an ambiguity between symmetric and triangular and MATLAB considers those matrices as triangular and not symmetric. Of course, 3. is more generic than 2. so you can achieve 2. using 3. and implementing it yourself, if that makes sense.
Regarding 4., distributed arrays are multi-threaded: they'll use NumThreads as per your cluster profile configuration. Note the default is set to 1, so no multi-threading.
To use the distributed version, you simply need to call the Krylov method you'd like to use but with A being a distributed array.
Finally regarding speed-up and performance, the most consuming operations are usually the matrix-vector product and the preconditioner. If you know a clever way to apply your operator, you should use it. Same if you know a clever problem-specific preconditioner. There is usually a non-trivial balance to find between a extremely good preconditioner that is very costly to apply (extreme case here is \) and a poor preconditioner that will lead to a very poor convergence or even no convergence to the prescribed accuracy (extreme case here is no preconditioner at all).
See Also
Categories
Find more on Parallel Computing Fundamentals in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!