Distributed job validation passes but parallel job validation fails for Parallel Computation Toolbox.
Show older comments
Hi,
I am trying to use matlab parallel computation toolbox on a cluster. When I try to validate my scheduler configuration, the distributed job passes the validation but the parallel job fails with the following error:
Stage: Parallel Job
Status: Failed
Description: The given stage reached the default or user-specified timeout.
Command Line Output:
2346069.pbs001.palmetto.clemson.edu
Additionally I find the following error in the lob file on the cluster:
Node file: /var/spool/torque/aux//2346072.pbs001.palmetto.clemson.edu
Starting SMPD on node0218 node0219 node0275 node0276 ...
ssh node0218 "/opt/matlab-R2010a/bin/mw_smpd" -s -phrase MATLAB -port 26072
Warning: Permanently added 'node0218,10.125.1.218' (RSA) to the list of known hosts.^M
Permission denied, please try again.^M
Permission denied, please try again.^M
Permission denied (publickey,gssapi-with-mic,password).^M
Launching smpd failed for node: node0218
Stopping SMPD on ...
Exiting with code: 0
The settings which I have used for the scheduler are:
set(sched, 'ClusterMatlabRoot', '/opt/matlab-new');
set(sched, 'HasSharedFilesystem', true);
set(sched, 'ClusterOsType', 'unix');
set(sched, 'SubmitFcn',{@pbsNonSharedSimpleSubmitFcn,clusterHost, remoteDataLocation});
set(sched, 'ParallelSubmitFcn',{@pbsNonSharedParallelSubmitFcn, clusterHost, remoteDataLocation});
I have also setup a passwordless ssh connection using a rsa key. Could anyone tell me what is wrong with my configuration?
Thanks in advance.
1 Comment
Sarah Wait Zaranek
on 14 Mar 2011
Did you set up passwordless ssh between all nodes of the cluster?
Accepted Answer
More Answers (0)
Categories
Find more on Job and Task Creation in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!