What happens if I use parfor and the fmincon/fminunc option "UseParallel" together?

6 views (last 30 days)
I want to minimize the same objective function using n different starting points. I have k cores available. I have .
Consider the following (toy) code:
parfor ii = 1:n
x_opt(ii) = fminunc( obj_fun, starting_point(ii), 'UseParallel', true )
end
Is MatLab smart enough to allocate cores to each optimization?
  7 Comments
Michael Stollenwerk
Michael Stollenwerk on 20 Nov 2020
Ok, thanks. That's unfortuante, since I have the suspicion that in my case 'UseParallel' with corse would improve speed by a lot, since obj_fun has a high-dimensional input array and thus many (parallel) function evaluations are used to estimate the gradient in each step.
Mario Malic
Mario Malic on 22 Nov 2020
I think it's worthy to try a serial optimisation for outer loop and UseParallel option, especially if your function evaluation takes some time to evaluate.

Sign in to comment.

Answers (1)

Matt J
Matt J on 18 Nov 2020
I don't know the answer to that directly, but it is easy to show by example that turning off 'UseParallel' can be beneficial:
objfun=@(x)sum(x.^2);
x0=rand(2000,1);
opts=optimoptions('fminunc','Display','none','UseParallel',true);
tic;
parfor i=1:4
fminunc(objfun,x0,opts);
end
toc%Elapsed time is 0.369885 seconds.
opts.UseParallel=false;
tic;
parfor i=1:4
fminunc(objfun,x0,opts);
end
toc%Elapsed time is 0.274740 seconds.
  9 Comments
Raymond Norris
Raymond Norris on 19 Nov 2020
I'm puzzled how Matt's example worked when UseParallel is set to true. If you're running a parfor loop, I would think the optimization would not use the parallel pool and hence run serially. This would explain slightly why the performance.
Secondly, 2000x1 most likely just isn't big enough. Take a look at the following with 20000. For starters, when I benchmark, I'm setting maxNumCompThreads to 1 for a baseline and the setting to 4 later. This tells me what implicit multi-threading MATLAB gives me.
objfun = @(x)sum(x.^2);
x0 = rand(20000,1);
opts = optimoptions('fminunc','Display','none','UseParallel',false);
%% Baseline
maxNumCompThreads(1);
tic
for i=1:4
fminunc(objfun,x0,opts);
end
toc
% Elapsed time is 197.351202 seconds.
%% 4 threads, no additional parallelism
maxNumCompThreads(4);
tic
for i=1:4
fminunc(objfun,x0,opts);
end
toc
% Elapsed time is 75.309341 seconds.
%% 4 workers, but parallelism is minimal
opts = optimoptions('fminunc','Display','none','UseParallel',true);
tic
for i=1:4
fminunc(objfun,x0,opts);
end
toc
% Elapsed time is 64.789117 seconds.
Matt J
Matt J on 19 Nov 2020
Edited: Matt J on 19 Nov 2020
I'm not sure what the manipulation of maxNumCompThreads is supposed to tell us about the interaction between parfor and UseParallel. I would think that reducing maxNumCompThreads will add even more bottlenecks to the computation that wouldn't otherwise be there, because now fminunc cannot maximally exploit the multi-core resources even for basic linear algebra steps. With more bottlenecks, the parallelization or non-parallelization of the gradient calculation step (controlled by UseParallel) will have a diminished impact.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!