How to efficiently pass large constant matrix in parfor

14 views (last 30 days)
Hi all,
I am trying to perform a task using parfor. However, it's inefficient since I have to load a large constant structure from a mat file every time I call the function. Is there a way I can load this structure in shared memory or use it as global variable or some thing else?
Here's a short description of simplified form of my problem.
  1. Let's say we have a structure "S". The size of this structure is ~15GB.
  2. This structure "S" is a constant every time we call the function "fun".
  3. All workers need "S".
Our Aim is as follows:
parfor j =1:1000
output = fun (S, j)
end
Now, we have four options to perform the above aim efficiently viz:
  1. Declare "S" as global variable. But as per my understanding, this is not possible in parfor.
  2. Have "S" in a shared memory so that all threads can access it since "S" is a constant for all the threads. But how do we do it?
  3. Make multiple copies of "S", and have each worked access a separate copy. But this may not be possible since size of "S" is very large and we may be bounded by available memory.
  4. Every time we call the function "fun", the function "fun" first loads the structure "S" from mat file and then works on it. However, this is very inefficient process.
Any thoughts?
Thanks in advance.

Answers (6)

Sean de Wolski
Sean de Wolski on 7 Mar 2014
One possibility:
Though this will require S to live on each worker concurrently (i.e. case 3 above). Do all workers need all of S or is there a way that you could break it into chunks and have each worker work on a chunk?

Walter Roberson
Walter Roberson on 9 Jun 2018
Sean mentioned http://www.mathworks.com/matlabcentral/fileexchange/31972-worker-object-wrapper which was the appropriate supported way at the time he posted.
In R2015b, parallel.pool.constant() was added to permit a constant to be distributed to all of the workers.
The supported mechanism for global variables is to create the pool and then use parfevalOnAll to run code that initializes the global variable on the worker, using data copied in somehow (this does not use parallel.pool.constant() so data might be copied individually.)
Also supported is to use memmapfile() on each worker; however, this can effectively require that each worker read in the contents of the file.
Not supported but efficient is to use FEX:sharedmatrix which uses shared memory segments. Note that this will only work within any one node or at best across nodes that support unified memory such as NUMA; if you are working with clusters that have independent address spaces per node then you might need to distribute the memory to each node once, to be shared among the workers there.
  2 Comments
Pavel Sinha
Pavel Sinha on 11 Jun 2018
Hi,
Thanks for the answer.
Now, if I have a matlab worker running some functions on GPU, can the functions running on GPU access global variables. Since this GPU function is no longer being executed on the Matlab worker, rather on the GPU, I was hoping, the global variables will be accessible from GPU.
Walter Roberson
Walter Roberson on 12 Jun 2018
Unfortunately there are conflicts between the newest MATLAB and the newest CUDA on my particular operating system, so I cannot test this out.
I think a gpuArray object at the MATLAB level is a handle object that refers into the GPU, so I guess that storing one as a global variable and changing it within the same worker would be no different than passing the gpuArray object between different routines that might modify it.
But, to be clear, no matter what you do, the global variables from one worker will not be the same as the global variables from another worker. You cannot pass references to variables stored in the GPU between workers and expect to be able to access the same GPU memory from any of those workers.

Sign in to comment.


Marta Salas
Marta Salas on 8 Mar 2014
Edited: Marta Salas on 8 Mar 2014
You can use global variables on parfor but, you cannot define global variables or persistent variables within the body of the parfor loop. In your case, S is a constant you can define outside the parfor.
global S;
parfor j =1:1000
output = fun (j)
end
function fun(j)
global S
%your code
end

Pavel Sinha
Pavel Sinha on 9 Jun 2018
Edited: Walter Roberson on 9 Jun 2018
Hi Marta,
In my code the global variable S, in the example above, which is being set before the parfor loop is being read as empty by the worker
so, if the do the following:
function fun(j)
global S
S
%your code
end
I get,
S =
[ ]
There must be a way for the workers to read and write to global variables. I fully understand that they variables will not be in sync with respect to other workers. I am fully aware of it. My problem right now is that I am simply not able to write or read global variables from with in a function that has been called inside a parfor loop.
Thanks,
Pavel
  1 Comment
Walter Roberson
Walter Roberson on 9 Jun 2018
The contents of global variables are never sent from the client to the workers, and are never copied back.

Sign in to comment.


Pavel Sinha
Pavel Sinha on 9 Jun 2018
Edited: Walter Roberson on 9 Jun 2018
clc;
close all;
clear all;
NO_PAR_POOLS = 3;
if isempty(gcp('nocreate'))
parpool(NO_PAR_POOLS);
end
global S;
S=10;
parfor j =1:3
if j==1; fun1(); end
if j==2; fun2(); end
if j==3; fun3(); end
end
function fun1()
global S
S
end
function fun2()
global S
l=S+10;
l
end
function fun3()
global S
k=S*10;
k
end

Pavel Sinha
Pavel Sinha on 9 Jun 2018
This is what I get:
S =
[]
l =
[]
k =
[]

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!