Parallel Computing Toolbox: createParallelJob, createTask, submit --- nothing happens

1 view (last 30 days)
I'm trying to create a parallel job on a Rocks Linux cluster using PBS. This is how I try to do it:
sched = findResource('scheduler', 'configuration', 'MyConfig.import3');
set(sched, 'UserData', '/share....');
pj = createParallelJob(sched);
createTask(pj, @LEO_MainIII_6000, 1, {});
set(pj, 'MaximumNumberOfWorkers', 40);
set(pj, 'MinimumNumberOfWorkers', 40);
submit(pj);
waitForState(pj);
out = getAllOutputArguments(pj);
celldisp(out);
destroy(pj);
'LEO_MainIII_6000.m' is the actual script with the code to be executed. This is the output I get:
< M A T L A B (R) >
Copyright 1984-2010 The MathWorks, Inc.
Version 7.12.0.635 (R2011a) 64-bit (glnxa64)
March 18, 2011
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
{Warning: The PBSPro scheduler type is being used with a possibly incompatible
version of PBS.
The use of job arrays has been disabled.
The version of PBS we detected was:
"2.4.16"}
> In distcomp.pbsproscheduler.pbsproscheduler>iCheckPBSProVersion at 67
In distcomp.pbsproscheduler.pbsproscheduler at 12
In distcomp.createObjectsFromProxies at 46
In findResource>iCreateFileBasedScheduler at 381
In findResource>iFindScheduler at 333
In findResource at 172
In Master3 at 3
Configuration: 'MyConfig.import3'
Type: 'PBSPro'
DataLocation: '/share/scratch/z0000000/nMOR4000'
HasSharedFilesystem: 1
Jobs: [1x1 distcomp.simpleparalleljob]
ClusterMatlabRoot: '/share/apps/matlab/2011a'
ClusterOsType: 'unix'
UserData: '/share/scratch/z0000000/nMOR4000'
ClusterSize: 48
ServerName: 'leonardi.eng.unsw.edu.au'
SubmitArguments: '-l walltime=12:00:00'
ResourceTemplate: '-l nodes=1:ppn=^N^:vmem=90gb'
RcpCommand: 'scp'
RshCommand: 'ssh'
ParallelSubmissionWrapperScript: [1x76 char]
There is one warning and no errors. However, Matlab didn't run the script 'LEO_MainIII_6000.m' at all, it just didn't do anything. The whole process only takes a couple of seconds but if Matlab were doing something it should take hours. Any ideas what could be happening here?

Answers (2)

Edric Ellis
Edric Ellis on 21 Nov 2011
Firstly, I think it's probable that you have TORQUE rather than PBSPro installed on your system. You should use the TORQUE scheduler type.
What happens when you attempt to validate the configuration through the configuration manager?

Thomas
Thomas on 21 Nov 2011
This looks like an incompatibility in the version of the PBS scheduler with MATLAB..
I do not have a lot of experience with PBS however I am attaching an SGE script that submits MATLAB jobs to our cluster.
%always set these variables
%matlab_ver = 'R2011b'; % (MATLAB release supported by your license) R2009a R2009b R2010a
email = 'YOUREMAILID'; % your email address
email_opt = 'a'; % qsub email options
h_rt = '1:07:00'; % hard wall time
vf = '2G'; % Amount of memory need per task
%queue = 'sipsey.q' % specify queue
%min_cpu_slots = 4; % Min number of cpu slots needed for the job
max_cpu_slots = 7; % Max number of cpu slots needed for the job
disp('Please wait.. Sending job data to the Cluster.... ')
% Configure the scheduler - Do NOT modify these
%sge_options = ['-l vf=', vf, ',h_rt=', h_rt,' -m ', email_opt, ' -M ', email, ' -q ' , queue ];
sge_options = ['-l vf=', vf, ',h_rt=', h_rt,' -m ', email_opt, ' -M ', email ];
SGEClusterInfo.setExtraParameter(sge_options);
sched = findResource();
% End of scheduler configuration
get(sched)
job2 = createJob(sched);
tic
% start of user specific commands
job2= batch('USERmFile', 'matlabpool', max_cpu_slots, 'FileDependencies', {'USERFUNCTIONS'});
disp('Job submitted..')
datestr(clock)
% The following commands can be run once the job is submitted to view the results
disp ('Job sent to the cluster')
disp('USE >> waitForState(job) to wait for job to be finished')
disp('USE >> job.State to see job state')
disp('USE >> load(job,variable) to load variables back in the workspace OR')
disp('USE >> results = getAllOutputArguments(job) to load variables back in the workspace AND')
disp('USE >> results{:} to see the results')
Don't know if this will help you, although you can see how our job script is set up..
to test if MATLAB is correctly installed with the DCS, start an interactive version (ssh -X) of MALTAB (on the head node or a compute node- preferred, and set up the parallel configuration and run validation)
  2 Comments
Herwig Peters
Herwig Peters on 23 Nov 2011
Thanks for the script, I'll give that a shot once I've figured out how to run an interactive session... I'm not much of a cluster person.
I've enabled X11 forwarding which seems to be required for interactive sessions. I submit the job via 'qsub -I <script name>' which then brings me to the next free cluster node. But there's no GUI coming up and that's probably because of the missing DISPLAY variable and other tricks I've never heard of? No idea really, still trying to make sense of this cluster stuff...
Thomas
Thomas on 23 Nov 2011
After you log into the cluster with X forwarding, you are on the head node in interactive mode.. If you submit a job using 'qsub' it will get you to the next avaialble cluster node but will not continue the X session but it will be a batch execution session.
In order to continue your X session on a compute node you need to use qlogin, qrsh, or qsh depending on which one supports X (not really sure.. I think I use qlogin) then you can start MATLAB interactively..

Sign in to comment.

Categories

Find more on Startup and Shutdown in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!