how parfor distributes resources to each iteration if the number of cores is larger than the number of iterations

Question

XYC on 17 Jan 2024

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/2070911-how-parfor-distributes-resources-to-each-iteration-if-the-number-of-cores-is-larger-than-the-number

Commented: XYC on 19 Jan 2024

Here is the situation: I want to perform a few iterations using parfor on a cluster where I can get access to more but finite cores than the number of iterations. Given that each iteration consumes very much time and memory, say the total memory of 10 cores, but the average number of cores allocated for each iteration is less than 10, how should i configure the parfor to avoid the out-of-memory error? Does the parfor distributes resources to each iteration evenly?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Damian Pietrus on 17 Jan 2024

1
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/2070911-how-parfor-distributes-resources-to-each-iteration-if-the-number-of-cores-is-larger-than-the-number#answer_1391821

Since you mentioned working on a cluster, the first thing I'd like to address is making sure that you are requesting enough resources from the scheduler. If you are using the default values or requesting too little memory, this could be the cause of the out of memory errors you are seeing rather than the way parfor distributes iterations.

For example, if using the Slurm scheduler, you can use the --mem-per-cpu flag to request a certain amount of memory per core for your job. As an example, if you request 10 GB per cpu and use 10 workers in your parpool, then you will request a total of 100GB of memory on the compute node. (10GB x 10 cpu = 100 GB total). Increase the memory request as needed until your job has the memory it needs.

You may need to take a look at the documentation for your cluster to see what types of nodes are available and how much memory each node has. Often clusters have "high mem" nodes in addition to standard nodes.

Give that a shot first and let me know how it goes.

2 Comments
Show NoneHide None

XYC on 18 Jan 2024

Thanks very much for your reply. Before I post this question, i did some tests using the default mem-per-cpu. Let's say I have 5 iterations (each iteration needs 21GB memory) to perform using parfor, and I request 16 cores (each core has 4GB). It is OK if I request 6 cores (total 24GB>21GB) to perform each iteration sequentially, but performing 5 iterations using parfor with all 16 cores invokes out-of-memory error because parfor will perform all of them simultaneously (total 64GB requested < total 105GB needed).

I wonder if there is a way to tell parfor to perform first three iterations out of the total 5 simultaneously and then the rest two. Or can i somehow specify the number of cores distributed to each iteration and thus prevent the parfor performs them simultaneously.

All in all, I will give it a try first to increase the memory per cpu.

XYC on 18 Jan 2024

I try to change the --mem-per-cpu flag but got the following error:

sbatch: error: Batch job submission failed: Job submission failed because too much memory was requested relative to the number of CPUs requested. The requested memory:CPU should be kept no more than DefMemPerCPU.

It seems I have no choice but to request more cores.

Sign in to comment.

Answer 2

Edric Ellis on 18 Jan 2024

3
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/2070911-how-parfor-distributes-resources-to-each-iteration-if-the-number-of-cores-is-larger-than-the-number#answer_1392216

Open in MATLAB Online

When a parfor loop executes on a parallel pool, PCT will divide up the entire loop range 1:N into a series of batches known as "sub-ranges". This division depends on the total number of iterates in the full range, and the number of workers available. By default, PCT tries to send batches that are big enough that communication costs are minimised, but also small enough that in the case where different iterations take different amounts of time, the workers are kept as busy as possible.

You can override this default division by using parforOptions. For example, to force parfor to send iterates individually, you can do this:

pfo = parforOptions(gcp(), RangePartitionMethod="fixed", SubrangeSize=1);
parfor (i = 1:10, pfo)
    out(i) = feature('getpid');
end
out'

3 Comments
Show 1 older commentHide 1 older comment

Edric Ellis on 19 Jan 2024

Yes, with that scheme, if there are more workers than iterations, you will not keep them all busy. (Sorry, somehow I overlooked that part of your question - I was mostly answering to clarify exactly how the iteration batching works).

I think the solution here is going to involve working with your cluster configuration to ensure the workers you get each have enough memory to run an iteration, rather than anything related to parfor specifically.

One way to head towards what I think you need is to use multi-threaded workers. From the MATLAB client side, you specify the NumThreads property in your parallel.Cluster profile. This needs support from your cluster integration scripts (I'm not an expert here I'm afraid). The result though is that you can end up with a parallel pool where each worker process is multi-threaded. This should have the desired side-effect that each worker process has more memory available. Also, if your time-consuming function happens to take advantage of MATLAB's intrinsic multithreading (e.g. large matrix operations), then that will also be a benefit.

XYC on 19 Jan 2024

thanks again for the answer! I'll look through the docs for more details about multi-threaded workers.

Sign in to comment.

Answer 3

Matt J on 18 Jan 2024

1
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/2070911-how-parfor-distributes-resources-to-each-iteration-if-the-number-of-cores-is-larger-than-the-number#answer_1391946

Edited: Matt J on 18 Jan 2024

You have to consider the number of workers, M, not just the number of cores, C.

If M is the number of parpool workers and N is the number loop iterations, then each worker will be assigned a consecutive subsequence of N/M iterations, which it will run serially. So, the amount of memory each worker will try to consume is the amount of memory used by N/M of your loop iterations, whatever that is.

The amount of memory each worker has available to it is RAMTotal/M.

Obviously, this math becomes slightly more complicated if N/M or C/M are not integers. Maybe also if your cores are tied up by other Apps besides Matlab.

8 Comments
Show 6 older commentsHide 6 older comments

Walter Roberson on 18 Jan 2024

Sorry, I do not have any information about how resources are allocated for distributed computing.

XYC on 18 Jan 2024

@Walter Roberson It's all right. Thanks anyway!

Sign in to comment.

how parfor distributes resources to each iteration if the number of cores is larger than the number of iterations

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (2)

3 Comments
Show 1 older commentHide 1 older comment

8 Comments
Show 6 older commentsHide 6 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

how parfor distributes resources to each iteration if the number of cores is larger than the number of iterations

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (2)

3 Comments Show 1 older commentHide 1 older comment

8 Comments Show 6 older commentsHide 6 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None

3 Comments
Show 1 older commentHide 1 older comment

8 Comments
Show 6 older commentsHide 6 older comments