Script using parfor spawns way too many threads
8 views (last 30 days)
Show older comments
Greetings,
I have a Matlab program whose main task is "embarrasingly parallel" - the inner loop that does the brunt of the work consists of entirely independent iterations. Thus, it is a perfect candidate for using parfor. So, depending on which machine I'm running on, I start a parpool, give it the number of physical cores as an argument, and indeed, it starts running the loop in parallel.
The problem is, if you monitor the thread usage, it spawns far more threads than the number of workers in the parpool, to the point that on certain machines, the OS eventually cuts Matlab off.
My best working theory is that something inside the loop must be inherently multithreaded, so that each iteration is itself creating some explosion of threads in the course of executing the loop iteration. But, even supposing that were true:
- How would I track down which line or function call is spawning new threads?
- If I can identify it, how does one stop Matlab from multithreading inside the parfor loop?
Thanks in advance for your assistance.
EDIT: I tried paring it down to the barest of essentials to narrow down the problem. Below is an example parfor loop from the parfor documentation. I've adjusted it to create 8 parallel workers. If I keep the Resource Monitor open and run it, I am seeing the workers each show up as their own Matlab process, but it is indicating each is using ~100 threads.
parpool(8)
parfor i=1:8, c(:,i) = eig(rand(1000)); end
4 Comments
Edric Ellis
on 30 Jun 2022
MATLAB processes make a distinction between "computational" threads and other threads. The setting NumThreads controls "computational" threads. This means that if you get a worker to perform a large matrix operation (for example), then it will do so using only a single thread. (Your desktop MATLAB will use multiple threads here). The other threads are a whole bunch of background threads, which mostly should not be performing significant amounts of work. These are present to orchestrate communication between client and workers, and a bunch of other stuff.
How many workers do you need to start before problems start arising? I would not expect running 8 workers on a single machine to cause problems, but maybe you are running more?
MATLAB worker processes consume memory. If you can use parpool("threads"), then this will avoid some of the extra memory and thread overhead. Not all MATLAB code is supported there though.
Answers (1)
Bruno Luong
on 29 Jun 2022
It sounds count productive to cut off the low-level multi-thread and move it on you parfor loop.
I rather split the parfor into 2 nested loops: an inner parfor a block of fewer iterations, then outer standard for loop until all the index is done.
0 Comments
See Also
Categories
Find more on Parallel Computing Fundamentals in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!