Best read-only data strategy for parfor
2 views (last 30 days)
I am using parfor on a grid with 60 workers.
I have some data which will be used read-only within the parfor loop.
I see that there are two options... load it on the machine I am submitting from so it is serialized and sent across the network (dedicated gigE for the cluster), or load it from disk within the loop.
Can anyone comment on which of these might be the best strategy for different data sizes? The data compresses very well so is about 20MB on disk but more than 1GB on in memory when loaded. What is the speed of loading and uncompressing in comparison to serialisation?
If I have it loaded on the submission machine, is matlab clever enough to serialize and send once to each worker or will it repeat it on every iteration. Obviously loading from a file would be done every iteration.
Any advice appreciated
Edric Ellis on 18 Oct 2012
I would recommend trying my Worker Object Wrapper. It's designed for just this sort of situation. In your case, you should put the files in a location available to the workers, and have them load the data using something like this:
w = WorkerObjectWrapper( @loadHugeData );
The object 'w' is then effectively a handle to the data. When you pass this into a PARFOR loop, the workers can then access the underlying data, like so:
parfor ii = 1:N
doSomethingWith( w.Value );