Why does Matlab require a Parallel computing toolbox in order to use more CPU cores?

Is it true that if I have a multi-core CPU computer and Matlab will only use one of the CPU cores, unless I leverage the parallel computing toolbox and use "parfor" instead of "for"?
Shouldn't it be natural that Matlab automatically decide when to utlize more of my CPU cores depending on the need?
Weird enough, sometimes parfor actually slows down my processing.

Answers (1)

Not quite. Matlab recognizes some patterns of operations and for sufficiently large matrices automatically calls high performance multicore libraries that use knowledge of cache behaviour and vectorized hardware instructions to improve performance. These execute within one process.
Running larger sections independently requires starting multiple processes and transferring the data back and forth and analyzing what really needs to be sent to each process. The parallel computing toolbox does this work.
Because of the process overhead and the data transfer overhead and the loss of opportunity to take advantage of multiple CPU, it is very common for parfor to come out slower.

6 Comments

Thanks for the reply!
So how should I determine when to use parfor and when to use for? After all, this is what perplexes me!
Why couldn't Matlab set up an algorithm and let the program to make the best decision by itself?
If you prototype using parfor on a small test case to ensure that your code is correct, intending to scale up to larger "real" problems once you're confident the code works, it would be counterproductive for MATLAB to second guess you and say "You asked for parfor but I think I know better so I'm just going to give you for instead."
As for when to use parfor versus for, this documentation page discusses some of the considerations you may need to take into account when making that determination. Using parfor does introduce some overhead (that page calls out communication, coordination, and data transfer) and if you're trying to solve a "small enough' problem the benefit of time savings from parallelization won't outweight the cost of actually solving the problem. The phrase "sufficiently large matrices" in Walter's response is very important.
As an analogy, if you have a shopping list with only three items on it dividing the list up among you and your two kids may not going to be worth it, particularly if you'll need to track down the kids once they've retrieved their items. But if you have a shopping list with 30 or 60 items, the overhead involved in gathering the kids / "results of the computations" may be less than the benefit you get from only having to get 10 or 20 items yourself instead of all 30 or 60 and so parallelization saves time. One additional factor is that if your kids are well behaved, maybe the threshold where you save time is closer to 3 items than to 30. It depends on the situation.
@Steven, that analogy works surprisingly well
Very good analogy indeed! It helps me understand why it is actually slower by using parfor.
Here is the problem. If it is about shopping with my kids, I can make that decision easily. For my Matlab processing, sometimes I can make that decision easily as well. The problem is that a lot of the other times I'm in a dark box in terms of deciding when to use parfor and when to use for. That's why I think some sort of algorithm will help.
As a quick test: monitor CPU usage while you operate with a "for". If it averages 1 to 2 cpus, and not so much memory, then if you can manage to find quite independent subsections of the task, parfor might well be better. But if you see significant bursts where all of the physical CPUs are in use, then it becomes doubtful that parfor could improve anything.
If it involves 1 to 2 cpus and a lot of memory, then whether parfor would help or not depends a lot on the memory access pattern. If it can be broken down into local access (say a row or column at a time) then parfor might be fine. If an entire array is used, such as large matrix multiply kind of data pattern, then parfor will probably need to transfer too much memory to benefit much.
You know more about your problem than MATLAB does, so it would likely be more difficult for MATLAB to "know" if parfor is going to be beneficial (beneficial meaning the increase in time due to the parallel overhead would be less than the decrease in time from running multiple iterations simultaneously in parallel.)
Running your code on smallers subsets of your data set would not only allow you to validate that the code is working correctly, it would also allow you to investigate the performance profile of your problem. If you want to ramp up to say a million data points, start off with a hundred and compare parfor and for. Repeat for a couple of larger subsets (say a thousand points and ten thousand points.)
Extrapolating from a few measurements can be dangerous but you're not trying to determine "Running this code for a million points will take exactly X seconds", you're trying to determine a trend. At the very least you may get a rough sense of whether or not parfor will obviously be beneficial to runtime, will obviously be detrimental to runtime, or whether it's unclear.
To extend the shopping analogy a bit closer to its breaking point, you take your kids to the grocery store on a Thursday morning in April when the store isn't busy and ask them to help by getting a subset of your list when you have only a couple items. You do this a couple times, with increasingly long lists, to determine if it'll be worth asking them for help when you're going shopping the day before Thanksgiving when the store is wall-to-wall shopping carts and you're getting approximately a hundred pounds of food for a big family dinner.

Sign in to comment.

Products

Release

R2019a

Asked:

on 13 Jun 2019

Commented:

on 14 Jun 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!