Parfor seems slower than for loop

Hi all,
I am trying to parallelize an operation in MATLAB. The MWE goes like this:
clear
A=rand(5000,3000);
B=int8(rand(5000,3000)>0.5);
lambda_tol=0.00000000000000004;
N=10;
%A='/Users/federiconutarelli/Desktop/Paper_Samuel/cd_2004.csv';
G=15;
lambda_tol_vector= zeros(G,1);
conto = 1;
for h=-G:0.1:G
lambda_tol_vector(conto)=2^(h);
conto = conto+1;
end
M=1;
tic
tol = 1e-9;
parfor (k = 1:size(lambda_tol_vector,1),M)
lambda_tol = lambda_tol_vector(k);
fprintf('Completion using nuclear norm regularization... \n');
[CompletedMat,objective,flag] = matrix_completion_nuclear_GG_alt(A.*double(B),double(B),N,lambda_tol,tol);
if flag==1
CompletedMat=zeros(size(A));
end
end
toc
Now, what I would expect is that when M>1 the time taken is lower than when M=1. However the decrease in time is not very sensitive. Why is like that?
Furthermore, I have noticed that, after a while that I do not launch the parfor, the latter takes on a while displaying the ollowing message before starting the loop:
Starting parallel pool (parpool) using the 'local' profile ...
Preserving jobs with IDs: 1 because they contain crash dump files.
You can use 'delete(myCluster.Jobs)' to remove all jobs created with profile local. To create 'myCluster' use 'myCluster = parcluster('local')'.
Connected to the parallel pool (number of workers: 6).
Is there a way to avoid this message to appear and go directly to the parfor loop?
Thank you

8 Comments

Hi Federico,
For the 1st query regarding M>1 value,
It is always impossible to have the linear relationship between the performance and the number of working cores. This is not just for MATLAB, even for our operating system. A 4 cores computer is not 4 times faster than a 1 core computer, because there is necessary overhead for distribution, communication, and management among the cores. Meantime the speedup might be also limited by other resources, like the main memory, cache or the Internet. This is the same situation for the PCT toolbox in MATLAB.
And some of the MATLAB function, for example, "fft" is internally optimized as a multithread algorithm. Thus when running these functions, it will automatically use multiple cores if possible. If we use these function in PCT again, the performance might not get better. Sometimes might even get worse, because of the overhead. However, for most cases, PCT will give a better performance.
And for the 2nd message if possible can you send the code for "matrix_completion_nuclear_GG_alt" function.
@Ayush thanks for the exaustive reply. My actual aim is to slow down the computaional time of the loop as it takes on forever with big matrices. Please find attached the function matrix_completion_nuclear_GG_alt. I have tried to make a little change inside the function ending u with matrix_completion_nuclear_GG_par (attached).
Another solution that I found along the way is the following:
p_ev= parfeval(@matrix_completion_nuclear_GG_alt,2,AB.Value,double(B),N,lambda_tol_vector(k),tol);
[CompletedMat,flag] = fetchOutputs(p_ev);
that is to use thread-based parallelization. The problem here is that, if I perform the same operation sequentially (i.e. with a normal for loop), the output, CompletedMat, is completely ddifferent in the two cases. hence I was wandering if this is due to the fact that the workers end up in different times and wrap u the results in the order of finishing. If so, is there a way to avoid this?
Generally, if you want to make your code run faster, first try to vectorize it. For details on how to do this, refer to the following link:
Vectorized code often runs much faster than the corresponding code containing loops.
You may also find the following documentation page useful to decide if ”parfor” is useful in your application, and how to best apply it:
The starting and stopping of a parallel pool of workers will also add to the overhead of using “parfor”. If you are planning on running parallel code multiple times in a row, you may wish to explicitly start and manage the parallel pool outside of the code you are running.
For further knowledge on how to improve performance, you can also access the following link:
In your function there are 2 warnings out of 1 is surely making your code slow, kindly try to eliminate that warning regarding "objective" variable on line87.
@Ayush Thank you a lot for your suggesionns!
Expanding on Ayush's advice to vectorize, you will find more advice on how to write faster MATLAB code here:
From the short look I took now, you will find a number of those recommendations apply to your code.
As Walter Roberson already commented in your other thread on this topic, many MATLAB operations are inherently multi-threaded, so simply jumping on the parallel-bandwagon is not a "universal speed up" that many beginners image it to be. In contrast, understanding how arrays are stored, careful testing, and following MATLAB best-practice are much more likely to have a positive impact on code performance.
@Stephen23 thank you a lot for improving on the questionn. I will surely pay more attention on carefully applying MATLAB bes-practice addvice and get deeper on the link about code improvement.

Sign in to comment.

Answers (0)

Categories

Products

Release

R2021b

Asked:

on 18 Nov 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!