Can I speed up big matrix operation with Parallel computing toolbox?

12 views (last 30 days)
I have a code which solves PDE on 3d space.
It has been almost vectorized, so It mainly perform algebraic operation for large 3d matrix.
For instance, D = A + 2*B + 3*C; (A, B, C, and D are all the matrices which have 2m+ elements)
Can I speed up this code?
I wrote some test code :------------------------
A = randi(10,300,300,300);
B = randi(10,300,300,300);
tic;C=A+B;toc;
parpool;
tic;D=A+B;toc;
delete(gcp);
------------------------------------------
With my desktop, which has 4 cores and 2 thread per each core, It takes 50ms and 60ms each. (no speed up with parpool)
Can I reduce these times?
Also, ultimately, I'm planning to increase size of matrix and the number of core.
is it possible?
Does distributed computing toolbox help it?
  1 Comment
James Tursa
James Tursa on 25 Nov 2015
Edited: James Tursa on 25 Nov 2015
The element-wise operations +, -, .*, ./ etc are already multi-threaded in the background for arrays that are large enough, so you should not expect a speed increase for these operations by manually doing something to parallelize these operations. I would be surprised if any test you could construct using only these elemental operations would be faster than the simple m-coded line. E.g., you could double check this on your machine by looking at core usage during a large matrix addition, etc.

Sign in to comment.

Answers (2)

Walter Roberson
Walter Roberson on 25 Nov 2015
Edited: Walter Roberson on 27 Nov 2015
There are some patterns that MATLAB is able to detect and call in to highly optimized parallel third-party libraries to execute.
These libraries take data in a different way than MATLAB uses internally so it takes time to convert the data, then the code has to be called, and then the result has to be converted back to MATLAB form. For this reason these libraries are not called unless the amount of data is "big enough" that MATLAB figures there will be an overall performance gain. The array sizes you mention should certainly be large enough to qualify.
When MATLAB calls in to those libraries, if MATLAB is not in any parallel pool (including the case where the user has no Parallel Processing Toolkit) then usually the libraries will create one thread per core on the local machine. It does not typically use hyperthreads because hyperthreads are often slower for CPU bound tasks.
When MATLAB calls in to those libraries, if parpool has been called but the user is not in any explicitly parallel section (such as with parfor or distributed computing) then the libraries will be restricted to the number of workers in the parallel pool. If the user has not configured the pool then that would typically be the same as the number of cores, but people who configure pools not infrequently configure them for fewer cores than the maximum (for example, keeping one core free to handle random operating system tasks, I/O, virus checking, user interaction). Therefore opening a parpool but not using a specific parallel command not infrequently results in slower progress than not using parpool at all (due to reduced available cores.)
When MATLAB calls into those libraries, if parpool has been called and the user is within a worker, then inside the worker there is only a single core available, so the third-party libraries end up getting invoked single-threaded for each worker (but a different worker would have a different core that might be executing the third-party library single-threaded for a different set of data.)
"Marshaling" the data for the third-party libraries involves copying the data and copying the result. For an operation as simple as element-by-element addition of two entire arrays of the same shape, addition is probably going to be as fast as copying, so the libraries might not be called at all. However, depending exactly what you are doing, MATLAB might be able to find a pattern such as "Multiply and Add" -- d = a.*b+c is one of the operations that can typically be done faster by the libraries ( by using hardware instructions )
If you use the Distributed Computing Toolbox, what you gain is roughly the ability to distribute the data to remote nodes, using additional remote resources rather than just the local resources. It does not typically gain anything over Parallel Computing Toolbox if you are the only user on the node that MATLAB is running on (e.g., a desktop system.)
If you restrict yourself to your local node, then you can see that just opening a parpool is not necessarily of any gain to you. However, you might be able to gain by dividing up the work between multiple workers, including by techniques such as creating Distributed Arrays, or using a GPU. And where something like parfor can win is cases where you have a mix of instructions to execute on data -- especially cases where you can divide your task into a bunch of blocks that can be executed independently without needing much "stitching together" of the results.

Joss Knight
Joss Knight on 27 Nov 2015
It looks like you've misunderstood how to use the Parallel Computing Toolbox's parallel pools. Once you've opened a pool, you need to use a command like parfor to use it. I suggest reading the Parallel Computing Toolbox documentation to get started.
As has been previously mentioned, you're not going to get any performance advantage doing standard matrix algebra using MATLAB workers. If your data fits into memory, the standard matrix operations are already optimised to take best advantage of your hardware. If your data does not fit into memory on a single machine, then you may find you need to use distributed arrays or a map-reduce approach.
The only way to get better performance for basic algebra is to beef up your hardware. One approach is to use a GPU:
A = randi(10,300,300,300);
B = randi(10,300,300,300);
tic;C=A+B;toc;
A = gpuArray(A);
B = gpuArray(B);
tic;D=A+B;wait(gpuDevice);toc;
For your example, I was able to get a 6x speedup on my NVIDIA Tesla K20.

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!