Script using parfor spawns way too many threads

10 views (last 30 days)
I have a Matlab program whose main task is "embarrasingly parallel" - the inner loop that does the brunt of the work consists of entirely independent iterations. Thus, it is a perfect candidate for using parfor. So, depending on which machine I'm running on, I start a parpool, give it the number of physical cores as an argument, and indeed, it starts running the loop in parallel.
The problem is, if you monitor the thread usage, it spawns far more threads than the number of workers in the parpool, to the point that on certain machines, the OS eventually cuts Matlab off.
My best working theory is that something inside the loop must be inherently multithreaded, so that each iteration is itself creating some explosion of threads in the course of executing the loop iteration. But, even supposing that were true:
  • How would I track down which line or function call is spawning new threads?
  • If I can identify it, how does one stop Matlab from multithreading inside the parfor loop?
Thanks in advance for your assistance.
EDIT: I tried paring it down to the barest of essentials to narrow down the problem. Below is an example parfor loop from the parfor documentation. I've adjusted it to create 8 parallel workers. If I keep the Resource Monitor open and run it, I am seeing the workers each show up as their own Matlab process, but it is indicating each is using ~100 threads.
parfor i=1:8, c(:,i) = eig(rand(1000)); end
Matthew Lockner
Matthew Lockner on 2 Jul 2022
To answer the question of how many workers do I need to start before problems start arising, the answer is system-dependent. I have run the code on 32-core nodes where the job is granted the full node; as I recall, I was told I would have to use "ulimit -Su 20480" to ensure a good submission. I'm not an expert, but I gather the command increases the amount of threads the OS will grant before it won't grant anymore. In contrast, I have a cluster with nodes of 96 total cores - I find that I cannot start more than 16 processes reliably without this issue arising. On my personal 8-core PC, it seems to handle 8 processes just fine.
The fun part is that the code does scale with number of cores - not necessarily 1/N but significantly. I will try your suggestion of using parpool("threads") and see what effect that has. It seems I mostly just need to convince the OS to accept a flood of threads and it all will run fine.

Sign in to comment.

Answers (1)

Bruno Luong
Bruno Luong on 29 Jun 2022
It sounds count productive to cut off the low-level multi-thread and move it on you parfor loop.
I rather split the parfor into 2 nested loops: an inner parfor a block of fewer iterations, then outer standard for loop until all the index is done.


Find more on Parallel Computing Fundamentals in Help Center and File Exchange




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!