Why does parallel.pool.const create a copy of the variable in memory for each worker sequentially instead of in parallel?

1 view (last 30 days)
When creating a parallel.pool.const on 9 workers prior to using parfor, I noticed that the memory usage ramps up in 9 successive steps instead of all at once. The attached image shows these steps in memory usage prior to entering the parfor using 'Resource Monitor' on Windows 7. This seems to mean that the copies for each worker are created sequentially instead of in parallel and this takes alot of time. Why are these copies not created in parallel for faster execution? I am running R2017a.

Accepted Answer

Edric Ellis
Edric Ellis on 30 Jun 2017
I suspect you're creating the parallel.pool.Constant using data created on the client. It's much more efficient to have the workers create the data, if possible. Consider two cases:
% Case 1: data created on the client
parallel.pool.Constant(ones(1e4));
% Case 2: use the Constant constructor with a function handle to create
% the contents directly on the worker
parallel.pool.Constant(@() ones(1e4));
This results in the following memory usage pattern. In the screen-shot, case 1 is indicated with a red arrow, and case 2 with a green arrow.
As you can see, case 2 happens in parallel, and avoids the data transfer from the client to the workers (it's the data transfer that really causes the lack of parallelism).
If you really cannot create the data on the workers, you can use the parallel.pool.Constant constructor that accepts a Composite, like this:
% Build an empty Composite
c = Composite();
% Transfer the data from client only to worker 1
c{1} = ones(1e4);
c(2:end) = {[]};
spmd
% Use labBroadcast to copy data to all workers (labBroadcast
% is more efficient than the client/worker communication)
c = labBroadcast(1, c);
end
% Build the Constant from the Composite
c = parallel.pool.Constant(c);
% Flush memory on the workers by executing an empty SPMD block
spmd, end
  1 Comment
Joseph Hall
Joseph Hall on 30 Jun 2017
Thank you. Unfortunately, I am using real-world data and cannot have the data be created inside the workers, but are there other modes of data transfer that could be done in parallel such as accessing files on disk? I don't quite understand why data transfer cannot be done in parallel.

Sign in to comment.

More Answers (0)

Categories

Find more on Programming Utilities in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!