how parfor distributes resources to each iteration if the number of cores is larger than the number of iterations
4 views (last 30 days)
Show older comments
Here is the situation: I want to perform a few iterations using parfor on a cluster where I can get access to more but finite cores than the number of iterations. Given that each iteration consumes very much time and memory, say the total memory of 10 cores, but the average number of cores allocated for each iteration is less than 10, how should i configure the parfor to avoid the out-of-memory error? Does the parfor distributes resources to each iteration evenly?
0 Comments
Accepted Answer
Damian Pietrus
on 17 Jan 2024
Since you mentioned working on a cluster, the first thing I'd like to address is making sure that you are requesting enough resources from the scheduler. If you are using the default values or requesting too little memory, this could be the cause of the out of memory errors you are seeing rather than the way parfor distributes iterations.
For example, if using the Slurm scheduler, you can use the --mem-per-cpu flag to request a certain amount of memory per core for your job. As an example, if you request 10 GB per cpu and use 10 workers in your parpool, then you will request a total of 100GB of memory on the compute node. (10GB x 10 cpu = 100 GB total). Increase the memory request as needed until your job has the memory it needs.
You may need to take a look at the documentation for your cluster to see what types of nodes are available and how much memory each node has. Often clusters have "high mem" nodes in addition to standard nodes.
Give that a shot first and let me know how it goes.
More Answers (2)
Edric Ellis
on 18 Jan 2024
When a parfor loop executes on a parallel pool, PCT will divide up the entire loop range 1:N into a series of batches known as "sub-ranges". This division depends on the total number of iterates in the full range, and the number of workers available. By default, PCT tries to send batches that are big enough that communication costs are minimised, but also small enough that in the case where different iterations take different amounts of time, the workers are kept as busy as possible.
You can override this default division by using parforOptions. For example, to force parfor to send iterates individually, you can do this:
pfo = parforOptions(gcp(), RangePartitionMethod="fixed", SubrangeSize=1);
parfor (i = 1:10, pfo)
out(i) = feature('getpid');
end
out'
3 Comments
Edric Ellis
on 19 Jan 2024
Yes, with that scheme, if there are more workers than iterations, you will not keep them all busy. (Sorry, somehow I overlooked that part of your question - I was mostly answering to clarify exactly how the iteration batching works).
I think the solution here is going to involve working with your cluster configuration to ensure the workers you get each have enough memory to run an iteration, rather than anything related to parfor specifically.
One way to head towards what I think you need is to use multi-threaded workers. From the MATLAB client side, you specify the NumThreads property in your parallel.Cluster profile. This needs support from your cluster integration scripts (I'm not an expert here I'm afraid). The result though is that you can end up with a parallel pool where each worker process is multi-threaded. This should have the desired side-effect that each worker process has more memory available. Also, if your time-consuming function happens to take advantage of MATLAB's intrinsic multithreading (e.g. large matrix operations), then that will also be a benefit.
Matt J
on 18 Jan 2024
Edited: Matt J
on 18 Jan 2024
You have to consider the number of workers, M, not just the number of cores, C.
If M is the number of parpool workers and N is the number loop iterations, then each worker will be assigned a consecutive subsequence of N/M iterations, which it will run serially. So, the amount of memory each worker will try to consume is the amount of memory used by N/M of your loop iterations, whatever that is.
The amount of memory each worker has available to it is RAMTotal/M.
Obviously, this math becomes slightly more complicated if N/M or C/M are not integers. Maybe also if your cores are tied up by other Apps besides Matlab.
8 Comments
Walter Roberson
on 18 Jan 2024
Sorry, I do not have any information about how resources are allocated for distributed computing.
See Also
Categories
Find more on Parallel Computing Fundamentals in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!