Nested for loop to parfor

Subramanian (view profile)

on 10 Apr 2012
Hello,
I have nested for loops.After I convert the outer for loop to parfor (I get no compilation errors), the program doesn't even go to the first step in the parfor loop until after 2.5 hours or so. Can someone please tell me why?
I tried various configurations where the number of broadcast variables were small and contained no large arrays but still the same problem. The program structure is as follows:
Variable initializations (around 10)
Matrix initializations (all to zeros, some of size 15*15, some 225*225) % Even if these matrices are initialized as local matrices inside the parfor loop it makes no difference.
parfor m=1:200
del(m)=some_value_depending on m
m %doesn't even print this for 2.5 hours! :(
for v=1:800
H=[some 15*15 matrix with some elements depending on m and some
others on v, but none of them depend on both simultaneously]
for i=1:225
for j=1:225
M=[225*225 matrix computed from some operations on H]
end
end
for k=1:15
some one line operation on M giving 2 matrices W and S
end
B=Inverse(W)*S -- size is 15*1;
A(m,:)=B;
Few lines of code operating on A(m)
end
end
plot(different values)
As you can see, there are no functions, but there are several large matrices within the loops. Will writing each loop as a function make it faster? If so, can someone please explain why?
Also, can someone tell how to use profile command for parallel computations if at all?
Thank You
Subramanian

Richard Brown

Richard Brown (view profile)

on 11 Apr 2012
I take it that the nested i,j loop just creates one entry of M at a time, not a 225x225 matrix each iteration.
Also, why are you using inverse to solve linear system, why not W\S? And have you initialised specifically del and A?
Subramanian

Subramanian (view profile)

on 13 Apr 2012
Unfortunately, M matrix is calculated for each iteration.There may be smarter ways of doing it, but for now this is not the problem. The program doesn't even enter 1st statement after the parfor loop for nearly 2.5 hours, running on 32 cores on a cluster.
Yes, I use W\S. I have initialized del and A. And I am opening my configuration using matlabpool just before the parfor.

Ken Atwell (view profile)

on 10 Apr 2012

How long does this loop take to run before converting the 'for' to 'parlor'? Are you running multiple local MATLAB workers on your computer, or connecting to a cluster?
Here are a few things that could be an explanation (assuming local workers -- if a cluster is involved, you may need to consult with its admin):
1. Make sure you run 'matlabpool open' before the parlor loop (sorry if this sounds obvious, but it is a common pitfall).
2. While the code is running, run the Task Manager (on Windows), Activity Monitor (Mac) or similar tool and look at the CPU usage and memory usage. CPU usage will probably be spiked at 100% ("good"), but if memory is also spiked, the computer may be trashing (overly-relying on virtual memory, which will almost certainly overwhelm and benefit from parlor).
3. Run the code without 'parlor' and consult the Task Manager again for CPU usage. It is possible that the natural multithreading in MATLAB already doing a reasonable job, so there is little more to gain by switching to 'parfor'

Subramanian

Subramanian (view profile)

on 10 Apr 2012
The code hardly takes any time to run before getting for parfor- it is almost instantaneous.
1.matlabpool open is right before parfor.
2. I am running it on a cluster using a configuration defined by me - 32 cores.

on 11 Apr 2012

Are the matrices preallocated, or are they allocated dynamically at run time by the inner for loops? If matrices are not preallocated, try doing that.

Subramanian

Subramanian (view profile)

on 11 Apr 2012
Yes they are all pre-initialized to zeros before the parfor.