Unable to submit task result (Matlab parallel server)

1 view (last 30 days)
Hi,
I am running some tests on a cluster. I create a job, and I submit several tasks. But, I get the following error
Error: Cannot rerun task because there are no rerun attempts left (The task has no rerun attempts left.).
Original cancel message:
java.lang.Exception: Unable to submit task result - MATLAB will now exit and restart.
Where shall I start to look at? What does practically this error mean? Is it a problem on the client side, or on the cluster side?

Answers (1)

Raymond Norris
Raymond Norris on 2 Dec 2021
Hi Maria,
A few questions first:
  • Which platform is MATLAB Parallel Server running on, Linux or Windows?
  • Which scheduler are you using (MJS, PBS, etc.)?
  • What size pool are you running?
  • How many cores per node?
  • How much RAM per node?
If you're running non-MJS, try the following. I'll show using both batch and parpool.
setenv('MDCE_DEBUG','true')
cluster = parcluster;
% If you're using batch
job = cluster.batch();
job.wait
cluster.getDebug(job)
% If you're using parpool
pctconfig('preservejobs',true);
pool = cluster.parpool();
cluster.getDebug(cluster.Jobs(end))
If you're using MJS
mjs = parcluster;
mjs.ClusterLogLevel = 4;
% Call either batch or parpool
mjs.getClusterLogs()
Perhaps the log file will display something else. If I had to guess, I'm betting you're running out of memory.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!