HPC MATLAB parpool and speed

Question

RUAN YY on 25 Sep 2020

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/599929-hpc-matlab-parpool-and-speed

Answered: RUAN YY on 25 Sep 2020

Hey guys! I am new to the HPCC. And I am now running my MATLAB program on it. I am using parellel computing, i.e. parpool

Here is the code for my "submit.sh"

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
/opt/hpc/MATLAB/R2019b/bin/matlab -nojvm -nodesktop -r "main_MultiEA;exit;"

The first thing is that I found the speed is similar to my local computer. Should I specify something in the .sh file to change this? And how can I know whether I reach the limit of the resource or not?

The second thing is that I found that the only available parpool is "local", using the "allNames = parallel.clusterProfiles()" command. Should it be different on the HPCC?

The third thing is that when I use "parpool(16)" or "parpool('local',16)" or "parpool("myPool",16)" etc.. to try to improve the speed, it the program seems to crash. Here is my test.m to test the parpool. And I guess the program crashes as there is no a.mat in the directory.

parpool("local",16);
a=0;
parfor i = 1:10
        a = a+1;
end
save a.mat;
exit;

Would you tell me why's that? And how can I improve the speed? Thanks a lot!!

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Raymond Norris on 25 Sep 2020

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/599929-hpc-matlab-parpool-and-speed#answer_500569

Open in MATLAB Online

Hi Ruan,

There are two ways to speed up your code, implicitly and explicitly. You don't have much control over implicitly. MATLAB will find the best ways to use your multi-cores. Explicitly, you can vectorize, pre-allocate, MEX-files, etc. You can also use parallel pools.

Looking at your Slurm job script, make the following change:

/opt/hpc/MATLAB/R2019b/bin/matlab -nojvm -nodesktop -r "main_MultiEA;exit;"

to

/opt/hpc/MATLAB/R2019b/bin/matlab -batch main_MultiEA

-batch works instead of -nodesktop, -r, "exit". And you'll need the JVM if you use PCT.

I'd also consider using module if you have it (your module name -- matlab -- might be slightly different)

module load matlab
matlab -batch main_MultiEA

Next, you're requesting from Slurm 2 nodes, with 2 cores per node (total of 4 cores). But MATLAB only runs on a single node, so the 2nd node is of no use. That means when you start the pool of 16 workers, you're running it on 2 cores (or you should be -- might depend if you have cgroups). This is probably why MATLAB is crashing -- you're running out of memory. To write this more flexibly, try

sz = getenv('SLURM_CPUS_PER_TASK');
parpool("local",sz);
a=0;
parfor i = 1:10
    a = a+1;
end
save a.mat

This way, regardless of the cores per node you request, you'll get the right size.

With that said, there are two things to think about

obviously, you'll see no speed up in your example. There has to be a reasonable amount of work to do.
using the "local" profile, the parallel pool will only run "local" to wherever MATLAB is running (on the HPC compute node). If you want to run a larger pool, across nodes, then you'll need to create a Slurm profile with MATLAB Parallel Server.

Raymond

3 Comments
Show 1 older commentHide 1 older comment

Raymond Norris on 25 Sep 2020

Open in MATLAB Online

test.m is calling save at the end, so when you call test, either via CLI or Slurm, you're going to generate a.mat. Do you not want the MAT-file to be generated? If not, simply comment out the line at the bottom of the file.

If this doesn't work

sz = getenv('SLURM_CPUS_PER_TASK');

then you might try

sz = getenv('SLURM_JOB_CPUS_PER_NODE');

What Slurm output/error file is being generated? If you're Slurm jobscript is only specifying the name of the job (Group3), it's possible

You're not requesting enough cores (16 or 17). Add #SBATCH -n 16
You're not requesting enough memory. Add #SBATCH --mem-per-cpu=2048

For instance:

#SBATCH -J Group3
#SBATCH -n 16                 # Request 16 cores
#SBATCH --mem-per-cpu=2048    # Request 2 GB/core
/opt/hpc/MATLAB/R2019b/bin/matlab -batch test

Otherwise, please paste in the crash.

RUAN YY on 25 Sep 2020

Thank you very much! Let me try!

I want to use the a.mat file to see whether the program crashes or not. That's why I added that "dummy" statement.

Sign in to comment.

Answer 2

RUAN YY on 25 Sep 2020

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/599929-hpc-matlab-parpool-and-speed#answer_500629

Open in MATLAB Online

I know why there is no .mat file output now.

[Warning: Objects of class 'parallel.cluster.Local' cannot be saved to MATfiles.] 

I should check the slurm-JobID.out file, eg. slurm-21127.out

The print or warning or anything that is supposed to be output to your command line in your normal GUI will be stored int he slurm...file.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

HPC MATLAB parpool and speed

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments
Show 1 older commentHide 1 older comment

More Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

HPC MATLAB parpool and speed

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments Show 1 older commentHide 1 older comment

More Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

3 Comments
Show 1 older commentHide 1 older comment

0 Comments
Show -2 older commentsHide -2 older comments