Running parfor on multiple nodes (20 nodes = 960 cores) using Slurm (super computer)

14 views (last 30 days)
What I like to do is to run a Matlab script (written to be run in parallel on multiple cores using "parfor" concept) on a supercomputer. The issue is not to run the script on just one node (a node includes 48 cores) but is to run it on multiple nodes (20 nodes = 960 cores).
Please consider that I am a normal user of the supercomputer.
everytime I want to rum my code it gives error that connection is lost or it can not uses the whole sources that are available. how it is possible to run the code parallelly on 20 nodes which have 960 cores.
The commend that I am using follows:
#!/bin/bash
#SBATCH -J monte_carlo
#SBATCH -o monte_carlo.o%j
#SBATCH -t 250:00:00
#SBATCH -N 20 -n 960
#SBATCH --mem-per-cpu=2GB
module load matlab/r2023a
matlab -r InversProblem
and the code that I want to run is:
clear, close all
clc
LogData = readtable("all_logs.xlsx","Sheet","Sheet1");
% Depth RHOP DTP DTS LLD PHIE SWE So Sg Vol_Cal Vol_Shale
warning off
SGg = 0.6;
rh0 = 1.0;
depth = LogData.DEPTH;
rhob = LogData.RHOB;
dtp = LogData.DT;
res = LogData.RT;
phie = LogData.PHIE;
swe = LogData.Swe;
so = LogData.So;
sg = LogData.Sg;
vol_cal = LogData.VolCal;
vol_shale = LogData.VolShale;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
C_minerals(:,:,1) = [46.4117225794362,12.6621444491628,12.6621444491628,0,0,0;...
12.6621444491628,46.4117225794362,12.6621444491628,0,0,0;...
12.6621444491628,12.6621444491628,46.4117225794362,0,0,0;...
0,0,0,12.7844541038098,0,0;...
0,0,0,0,12.7844541038098,0;...
0,0,0,0,0,12.7844541038098];
C_minerals(:,:,2) = convert_ku_c(62,26);
fr=0;
am1=[1, 1];
bm2=[1, 1];
cm3=[1, 1];
orientation_type_mineral={'chaotic','chaotic'};
theta_mineral=[];
phi_mineral=[];
psi_mineral=[];
mu_theta_mineral=[];
sigma_theta_mineral=[];
mu_phi_mineral=[];
sigma_phi_mineral=[];
alpha_theta_mineral=[];
beta_theta_mineral=[];
alpha_phi_mineral=[];
beta_phi_mineral=[];
aM_theta_mineral=[];
aM_phi_mineral=[];
%$
theta_pore=[0 0];
phi_pore=[0 0];
psi_pore=[0 0];
mu_theta_pore=[];
sigma_theta_pore=[];
mu_phi_pore=[];
sigma_phi_pore=[];
alpha_theta_pore=[];
beta_theta_pore=[];
alpha_phi_pore=[];
beta_phi_pore=[];
aM_theta_pore=[];
aM_phi_pore=[];
orientation_type_pore={'chaotic','chaotic'};
%$
phi=0;
theta=0;
R0=10^5*eye(3,3);
pH=7;
Z=[1 -1 1 -1]';
Cf=160000;
T=25;
beta= [5.2 7.9 36.3 20.5]'*10^(-8);
eta=10^(-3);
iso_aniso=1;
Km=zeros(3,3,2);
K0 = 9.8692326671601e-16 * eye(3,3);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
reza = numel(depth);
parfor i = 1:reza
C_matrix=c2dto4d(vol_cal(i).*C_minerals(:,:,1) + vol_shale(i).*C_minerals(:,:,2));
Vfrm=[vol_cal(i), vol_shale(i)];
pt=phie(i);
rho= rhob(i);
VpLog = 304.8/dtp(i);
ResLog= res(i);
p = 2.71*9.81*depth(i)*0.001;
T = 25/1000*depth(i);
[~, ~, Ko] = PVT_oil2(rh0,p,T);
[~, ~, ~, ~, ~, Kg, ~, ~] = PVT_gas(SGg,p,T);
K_fluid = [2.2, Ko, Kg];
SAT = [swe(i), so(i), sg(i)];
lb = [0, 0, 1e-12, 1e-12, 1e-12, 1e-6, 1e-6, 1e-6, 3, 3];
ub = [pt/2.5, pt/2.5, 1e-6, 1e-6, 1e-6, 1e-3, 1e-3, 1e-3, 100, 100];
x0 = (lb+ub)/2;
x0 (9) = 5;
x0(10) = 5;
Yexp = [VpLog, ResLog];
options = optimoptions("fmincon","Display","none","UseParallel",true);
[xSol, err(i)] = fmincon(@(x) PosteriorDensityRockPhysics_DeltTResPerm(x,...
K_fluid,...
SAT,C_matrix,fr,am1,bm2,cm3,...
C_minerals,orientation_type_mineral,theta_mineral,phi_mineral,...
psi_mineral,mu_theta_mineral,sigma_theta_mineral,mu_phi_mineral,...
sigma_phi_mineral,alpha_theta_mineral,beta_theta_mineral,alpha_phi_mineral,...
beta_phi_mineral,aM_theta_mineral,aM_phi_mineral,Vfrm,pt,theta_pore,phi_pore,...
psi_pore,mu_theta_pore,sigma_theta_pore,mu_phi_pore,sigma_phi_pore,...
alpha_theta_pore,beta_theta_pore,alpha_phi_pore,beta_phi_pore,...
aM_theta_pore,aM_phi_pore,orientation_type_pore,rho,phi,theta,VpLog, ...
ResLog,R0,pH,Cf,T,beta,eta,Z,Km,K0),x0,[],[],[],[],lb,ub,[],options);
RFP(i,:) = xSol;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[Vpe, ~, Reff, ~] = RockPhysicsModel_DeltTResPerm(xSol(1:8),K_fluid,...
SAT,C_matrix,fr,am1,bm2,cm3,...
C_minerals,orientation_type_mineral,theta_mineral,phi_mineral,...
psi_mineral,mu_theta_mineral,sigma_theta_mineral,mu_phi_mineral,...
sigma_phi_mineral,alpha_theta_mineral,beta_theta_mineral,alpha_phi_mineral,...
beta_phi_mineral,aM_theta_mineral,aM_phi_mineral,Vfrm,pt,theta_pore,phi_pore,...
psi_pore,mu_theta_pore,sigma_theta_pore,mu_phi_pore,sigma_phi_pore,...
alpha_theta_pore,beta_theta_pore,alpha_phi_pore,beta_phi_pore,...
aM_theta_pore,aM_phi_pore,orientation_type_pore,rho,phi,theta,VpLog, ...
ResLog,R0,pH,Cf,T,beta,eta,Z,Km,K0);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
lambda_prior = [4;1/5;4;1/5;0.5;0.5*(1-pt/2)/(pt/2);0.5;...
0.5*(1-pt/2)/(pt/2);0.5;0.5*(1-1e-9)/(1e-9);0.5;...
0.5*(1-1e-9)/(1e-9);0.5;0.5*(1-1e-9)/(1e-9);0.5;...
0.5*(1-1e-4)/(1e-4);0.5;0.5*(1-1e-4)/(1e-4);0.5;...
0.5*(1-1e-4)/(1e-4)];
lambdaLB = 0.5;
lambdaUB = 1.5;
ShapeFactor = 1 - 2*xSol(1:8);
ScaleFactor = (2*xSol(1:8).^2 - 3*xSol(1:8) + 1)./xSol(1:8);
ShapeFactorLB = 1 - 2*lambdaUB*xSol(1:8);
ScaleFactorLB = (2*lambdaLB*xSol(1:8).^2 - 3*lambdaLB*xSol(1:8) + 1)./(lambdaUB*xSol(1:8));
ShapeFactorUB = 1 - 2*lambdaLB*xSol(1:8);
ScaleFactorUB = (2*lambdaUB*xSol(1:8).^2 - 3*lambdaUB*xSol(1:8) + 1)./(lambdaLB*xSol(1:8));
%
lambda = [3;2*1/xSol(9);3;2*1/xSol(10);ShapeFactor(1);ScaleFactor(1);...
ShapeFactor(2);ScaleFactor(2);...
ShapeFactor(3);ScaleFactor(3);...
ShapeFactor(4);ScaleFactor(4);...
ShapeFactor(5);ScaleFactor(5);...
ShapeFactor(6);ScaleFactor(6);...
ShapeFactor(7);ScaleFactor(7);...
ShapeFactor(8);ScaleFactor(8)];
lambdaLB = lambda/2;
lambdaUB = 2*lambda;
lambda = lambda + randn(20,1).*lambda/5;
%
[lambda_final(i,:),~,~, ~] = runfmincon_DeltTResPerm(lambda,lambda_prior,...
lambdaLB,lambdaUB,K_fluid,SAT,C_matrix,fr,am1,bm2,cm3,...
C_minerals,orientation_type_mineral,theta_mineral,phi_mineral,...
psi_mineral,mu_theta_mineral,sigma_theta_mineral,mu_phi_mineral,...
sigma_phi_mineral,alpha_theta_mineral,beta_theta_mineral,alpha_phi_mineral,...
beta_phi_mineral,aM_theta_mineral,aM_phi_mineral,Vfrm,pt,theta_pore,phi_pore,...
psi_pore,mu_theta_pore,sigma_theta_pore,mu_phi_pore,sigma_phi_pore,...
alpha_theta_pore,beta_theta_pore,alpha_phi_pore,beta_phi_pore,...
aM_theta_pore,aM_phi_pore,orientation_type_pore,rho,phi,theta,VpLog, ...
ResLog,R0,pH,Cf,T,beta,eta,Z,Km,K0);
end
save all

Answers (1)

Edric Ellis
Edric Ellis on 11 Dec 2023
To run across multiple nodes of a cluster, you need to set up MATLAB Parallel Server on the cluster. You might well need help from your system administrator to get this working. Also, it is worth contacting MathWorks support to help get things set up, there are people there who can help get things working.
Once you have MATLAB Parallel Server set up, you will have a "Profile" that you can use with either parpool (for interactive use) or batch (for non-interactive use).
One of the advantages of using MATLAB Parallel Server is that it can automatically handle getting your code in place on the cluster for the workers to use.
  2 Comments
Abdolrazzagh
Abdolrazzagh on 12 Dec 2023
this is the university of houston's server therfore the all required things already instlled. I want to know what comment should i use to run the code correctly?
Edric Ellis
Edric Ellis on 12 Dec 2023
If you want to run across multiple nodes of the cluster, you need to use MATLAB Parallel Server. When you do this, you do not write a SLURM shell script directly; rather, you run something more like this:
clus = parcluster('SlurmCluster')
job = batch(clus, @myFcn, numOutputs, {in1, in2}, Pool=959)
Here you probably need to contact the administrator of the cluster for help getting the cluster profile to use. If this hasn't been done before, then MathWorks support can help get things up and running.
Once you've got the cluster profile configured, the batch function wraps up your code into a form to run on the cluster. Here, the function myFcn can use parfor, and the Pool=959 parameter means that myFcn will have a pool of 959 workers available. myFcn runs on a separate worker, for a total of 960.

Sign in to comment.

Categories

Find more on Third-Party Cluster Configuration in Help Center and File Exchange

Products


Release

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!