I have a rather complex finited differnece scheme im solving with an explicit method (i.e. all values needed to calcualte the next timesteps value are known) and I wanted to parallelize in order to understand how parfor works. When I use parfor my code runs excedingly slowly. I know its not ideal to have parfor inside a for loop (as I now have the pool setup time in each iteration) but I'm not sure how to avoid this since I cannot parallelize the time loop.
When I run the code with Number_Procs = 0, it computes relatively quickly. However, when Num_Procs is anything greater the code is greatly slowed.
I guess I have two main questions:
- Am I using parfor correctly or is there something that I am doing wrong that is causing the slow down?
- Is there a better way to parallilize the inner fro loop to avoid the pool setup time in each iteration?
Notes on scale of the problem:
The Time and Domain vecotrs can really be any size. Currently, I have the time vector of length 10,000 and the Domain vector of length 1,000, just for testing purposes. When I run with Num_procs = 0, the code completed in ~6 seconds. However, when I run with Number_Procs = 4, the code completes in ~185 seconds.
for t = 1:length(Time_vector)
% some setup stuff
parfor z = (1:length(Domain), Number_Procs)
U_ipo = Explicit_Solver(U_i); % Solve for the next timestep using the current