Move from parfor to parfeval?
Show older comments
I have a large simulation that uses a LSF cluster that supports Parallel Toolbox. Right now, the meat of the effort is in a parfor loop:
% Loop over cells of storm
celllist = find(~gotResult);
stormtmp = storm(celllist); % storm is array of handle-type classes
parfor i = 1:length(celllist)
icell = celllist(i);
fprintf('Cell %d...\n', icell);
matfileObj = pooldata.Value; %#ok<PFBNS>
ofac = zeros(sizevec,'single');
ofac = stormtmp(i).obsMatrix(grid,ofac,geom,rparms);
matfileObj.gotResult(1,icell) = true;
matfileObj.testOut(1,icell) = {ofac};
end
poolData is a parallel.Pool.Constant with a matfile object that is specific to each worker. At the end of the loop, I consolidate the results, clear the temporary files, and go to the next iteration of the model. This gives me robustness against cluster crashes which when the total job takes a month can be distressingly common (hence the checking for a gotResult at the start; I can have a partial result prior to a crash). The primary annoyance with parfor is that the allocation of units to workers happens rigidly at the start. With 60+ workers on 5-10 computers in a shared facility, there is no guarantee that they take a similar amount of time to finish. I find the last 25% or more of the execution time of each iteration is spent waiting for a dwindling number of my workers to finish their assignments. I've read the documentation for parfeval, and it seems to give me a way of more carefully managing each execution, but it's too convoluted to see how I get there. Any tips? It would seem that I would start with a find() to get the first N cells (N = # of workers) that need completing, and then enter a while loop using afterEach() where I could check for a valid result, and then get the next one on the list and assign it? Maybe I can do it with one main while loop and just remove entries from celllist() as they complete? Head is spinning...
1 Comment
Jeff Miller
on 25 Jan 2022
So the problem is that lots of workers are sitting idle while waiting for the slow ones to finish their assignments in this parfor loop? Instead of waiting here, you'd like to have them start on their assignments for the next parfor loop for the next "iteration of the model".
If that's right, maybe a simpler approach is to change this parfor loop so that it runs through all the cell/model combinations, something like this:
parfor i = 1:length(celllist)*NofModelIterations
For each i you'd need a little logic to work out which model and which cell you wanted, plus invoke the right model, but that might not be hard.
Accepted Answer
More Answers (0)
Categories
Find more on Parallel Computing Fundamentals in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!