Why is batch() so slow?

I'm trying to use batch() to load some data from a slow disk in the background, but it is extremely slow. See code example with timings below. I think it is slower than what can be explained by the overhead of communicating with the worker (consider that I am not even transferring the loaded data from the worker to the client in the example).
>> a = rand(512, 512, 1000);
>> save('a');
>> tic; load('a'); toc
Elapsed time is 5.574926 seconds.
>> tic; b = batch(@load, 1, {'a'}); toc; tic; wait(b); toc;
Elapsed time is 0.444297 seconds.
Elapsed time is 41.229590 seconds.
You can see that the time until the batch job is done is more than 35 s longer than the same operation on the client. This is not because a new Matlab worker has to be started -- in my example, a worker was already running (if no worker were running, the batch(...) command itself would take longer, not the wait(b)).
Where does this overhead come from? How can I avoid it? (I also tried parfeval, but parfeval is plagued by a memory leak that makes it unusable -- confirmed as a known bug by MathWorks).
Thanks, Matthias

2 Comments

Matthias
Matthias on 16 Dec 2014
Edited: Matthias on 16 Dec 2014
Even more bizarrely, if I right-click on the finished job in the Job Monitor and select Show Details, the displayed report indicates that the running duration of the job is 6 seconds. That's the same as the time it took on the client session. What happens in those 35 remaining seconds?
(I got this result on two different machines. Both running 2014b, however.)
Some more data:
>> disp(datestr(now, 'HH:MM:SS:FFF')); ...
b = batch(@batchTest, 1); ...
disp(datestr(now, 'HH:MM:SS:FFF')); ...
wait(b); ...
disp(datestr(now, 'HH:MM:SS:FFF'));
21:18:35:124
21:18:35:934
21:19:17:319
>> diary(b)
--- Start Diary ---
21:18:40:762
21:18:46:237
--- End Diary ---
Function batchTest:
function a = batchTest
disp(datestr(now, 'HH:MM:SS:FFF'));
load('a');
disp(datestr(now, 'HH:MM:SS:FFF'));
This shows that after executing the batch(...) command, ~5 s pass before the worker starts executing batchTest(). The worker is done executing batchTest() after another ~6 s, and hence executes that function just as fast as the clients. Then, another >30 s pass before wait(...) returns.
What happens in this time? Maybe the initial 5 s have to do with setting up the environment on the worker. But the 30 s after the job is done?

Sign in to comment.

 Accepted Answer

Edric Ellis
Edric Ellis on 16 Dec 2014

3 votes

Firstly, if you're using the local cluster type, then the batch command absolutely does need to launch the worker MATLAB process - it is not already running - you can verify this using Task Manager or similar. (Clusters of type MJS keep the workers running). The time for the batch command is simply the time needed to create the parallel.Job and parallel.Task objects needed for running the batch job, and saving those to disk.
Roughly speaking, the time taken to execute submitting and waiting for the results can be broken down like this:
  1. Time taken to create and submit the batch job to the scheduler
  2. Time taken to launch the worker process (unless you're using MJS)
  3. Time taken for the worker to load the job and task information
  4. Time for the worker to actually run the task
  5. Time for the worker to save the task results to disk (or database for MJS)
I suspect that the "missing" time is probably largely related to item 5 in the list above - as you've written it, the 512x512x1000 array is returned by your task function @load, and this result gets saved to disk.
How long does your save('a') command take? I suspect item 5 would take at least that long.
Note that there are several additional properties on the job object that can help you work out what's going on - see the reference page. In particular, note CreateTime, SubmitTime, StartTime, and FinishTime. The underlying task object has the same properties (except SubmitTime).

10 Comments

Mathias wrote (in an answer - moved to a comment):
Thanks for your detailed answer. Unfortunately, what you write does not fully explain my results.
First of all, the batch command was recommenrecommended to me by the MathWorks support for this exact use case: load a large file from disk in the background while performing computations on other workers. So I'm surprised to hear that batch only uses the disk for inter-instance communication. Can't it transfer data in memory like parfor/parfeval/spmd etc?
Also, there is still significant unexplained overhead. Yes, batch needs to create a new worker instance if there isn't already one, but it does not do that if a free worker is around -- at least that's what I deduce from the fact that the batch command itself executes slowly (several seconds) the first time and then very quickly the subsequent times. So if we add up the time it takes to write, read, and again write my example variable, that should be maybe 15-20 s (see timings posted above). But the job takes >40 s to return.
Matthias
Matthias on 16 Dec 2014
save('a') indeed takes ~25 s, so we're getting closer. That still leaves the question why the batch job would write the data to disk, rather than keeping it in memory on the worker until fetched with fetchOutputs.
The function running within a batch job can invoke parfeval, parfor etc providing you start your batch job with an appropriate 'Pool' argument. parfeval, parfor etc. never use the disk for communication.
I suspect there is some confusion here between having an open parallel pool of workers available, and running a batch job. When you open a parallel pool (either manually, or using parpool), then workers are launched and remain idle until you issue parfeval, parfor etc. When you launch a batch job, new workers are launched - these will always be new MATLAB worker processes (unless you're using MJS, in which case the worker processes might be recycled). The lead worker running a batch job has a pool available to it if you specify the 'Pool' argument.
To be perfectly honest, using batch with the local cluster type is of limited benefit since the workers are only able to run while the desktop MATLAB is running. You'd almost certainly be better off using parfeval. Here's some timings using that:
>> tic, f = parfeval(@load, 1, 'a'); toc
Elapsed time is 0.005802 seconds.
>> tic, wait(f); toc
Elapsed time is 10.865625 seconds.
>> tic, a = f.OutputArguments{1}; toc
Elapsed time is 1.872169 seconds.
(Note that on my machine, loading a.mat takes about 10 seconds). Note that it still takes ~2 seconds to read the outputs - that's because the result of the load command is stored in memory but in a transferrable form (because it has been transferred from the workers), so that 2 seconds is still overhead that you cannot avoid.
It would be much better if you could load 'a.mat' on a worker and operate on it there too. Here's a slightly contrived example using getfield so that I can write everything in one expression:
>> tic; f= parfeval(@() mean(getfield(load('a'), 'a')), 1); toc
Elapsed time is 0.005312 seconds.
>> tic; wait(f); toc
Elapsed time is 10.863416 seconds.
>> tic; f.OutputArguments; toc
Elapsed time is 0.006360 seconds.
Matthias
Matthias on 16 Dec 2014
Edited: Matthias on 16 Dec 2014
EDIT: Just saw that you replied to my initial post about parfeval -- I'll look into it. Thanks!
I initially used parfeval, but parfeval suffers from a memory leak: Even if I delete the parfeval "future" object, the memory on the worker does not get cleared. So every time I call parfeval, more and more leaked memory accumulates on the workers and I can only clear it by re-starting my parallel pool, which is not a viable solution for unrelated reasons. I contacted MathWorks about this and they acknowledged that this memory leak is known to them. The advice I received was to look into batch instead.
If you know a workaround for the parfeval problem, I'd be excited to hear about it.
I might look into using batch to perform the entire processing, not just the loading.
There's a fix for the parfeval memory leak described here. (The same problem shows up during mapreduce)
Matthias
Matthias on 16 Dec 2014
Not sure if I'm doing something wrong, but the attachment of the bug fix (attachment_1144305_14b_2014-12-01.zip) you referenced only contains one file: license.txt. There's nothing else in the zip file.
Hm, that's not what I see from here - for me that file is around 3MB, and has a handful of files in it. Please could you try again? If that still doesn't work, let me know...
Matthias
Matthias on 16 Dec 2014
It's strange: I can only see one folder ("bugreport") and one file inside ("license.txt") when opening the file with the built-in Windows 7 zip viewer. When I open it with 7-zip, I can see that there is a second folder on the top level, which does not seem to have a name. The files in there look like they might be the bug fix. I'll try to install it soon. Thanks for your help so far.
Matthias
Matthias on 16 Dec 2014
Edited: Matthias on 16 Dec 2014
The bugfix removes the memory leak! Thanks a lot!
However, loading in the background with parfeval still doesn't work as intended: Parfeval may not block the client Matlab instance, but it apparently does block other parallel functions. See this example:
fprintf('Start: %s\n', datestr(now, 'HH:MM:SS:FFF'));
f = parfeval(@pause, 0, 10);
fprintf('Outside parfor: %s\n', datestr(now, 'HH:MM:SS:FFF'));
parfor i = 1:10
fprintf('Inside parfor: %s\n', datestr(now, 'HH:MM:SS:FFF'));
end
wait(f);
fprintf('End: %s\n', datestr(now, 'HH:MM:SS:FFF'));
Output:
Start: 14:50:45:204
Outside parfor: 14:50:45:219
Inside parfor: 14:50:55:297
Inside parfor: 14:50:55:297
Inside parfor: 14:50:55:297
Inside parfor: 14:50:55:297
Inside parfor: 14:50:55:297
Inside parfor: 14:50:55:297
Inside parfor: 14:50:55:312
Inside parfor: 14:50:55:312
Inside parfor: 14:50:55:312
Inside parfor: 14:50:55:312
End: 14:50:55:328
The timings suggest that the execution works like this: 1. Parfeval sends jobs to one worker. 2. Parfor waits until all workers are available. 3. Parfor executes.
I had hoped that it would be more like this: 1. Parfeval sends job to one worker; then continues execution in main Matlab instance. 2. Parfor runs on whichever workers are available; parfeval continues to run on one worker until done.
Is the behavior I'm observing intended? Maybe I just didn't properly understand the way the parallel toolbox worked...right now, it seems frustratingly inflexible.
Unfortunately, as you observe, PARFOR cannot proceed while there are outstanding PARFEVAL requests (the same applies for SPMD). Your best option in this case is to recast your PARFOR loop as a series of PARFEVAL requests.

Sign in to comment.

More Answers (0)

Categories

Asked:

on 16 Dec 2014

Commented:

on 17 Dec 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!