Problem with parallel computation on cluster
1 view (last 30 days)
I'm running a parallelized minimization routine in MATLAB on a cluster and continue to encounter a problem that I don't understand: Seemingly randomly, MATLAB fails to engage all of the (8) cores that I request.
In the successful cases, MATLAB reports at the beginning of execution: "Starting parallel pool (parpool) using the 'local' profile ...connected to 8 workers."
In the unsuccessful cases, MATLAB reports: "Starting parallel pool (parpool) using the 'local' profile ..." but there is no report of 8 workers being connected. And in these cases, the resource usage reports and runtime suggest that MATLAB is only utilizing a single core.
Sometimes, this problem is alleviated by preceding submission of the job by the command: rm -rf ~/.matlab/local_cluster_jobs
Recently, however, that has failed to work, and I am perplexed. Any help understanding the problem would be great. Thanks!