FFT inside parfor loop on a multi-core computer does not accelerate

Question

matlabUser on 26 Jan 2012

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/27178-fft-inside-parfor-loop-on-a-multi-core-computer-does-not-accelerate

Hello,

I am trying to accelerate my code involving lots of FFT-s by putting it inside a parfor loop. All calculations in the loop are completely independent, and I don't see any error messages about non-classifyable variables. When I use 12 workers on my 12-core machine to run this parfor loop, I see that first 15-20 runs go very fast: it takes about 0.8 s to run a parfor loop. But then, calculation time begins to vary from a run to a run, from 0.8 s to 19 s, although these runs are completely equivalent in terms of computation load.

I am aware that FFT is multi-threaded and runs on many cores, so the communication overhead might interfere with parallelization in parfor. Then it is unclear to me why first 20 runs are so fast. I am using sliced arrays, for both input and output, and output arrays are only changed inside the parfor loop and not run-to-run, so there is no accumulating of the data to be communicated between workers.

If I use a standard for-loop instead of parfor, then calculation time stays very stable, around 4.5 s, which is much longer that initial 0.8s for parfor. Task manager shows that in this case all cores are pretty busy, with about 97% of the total CPU load. When I use parfor, the load is only about 50-60%, and it is still faster.

Any hint is really appreciated! -Thanks

3 Comments
Show 1 older commentHide 1 older comment

matlabUser on 30 Jan 2012

Hello,

If I force close the workers and reopen them between runs, the problem disappears. Well, I am getting the calculation time about 1.3s (the same as in the first run of the loop) instead of 0.7s. Still, this is much shorter than 5-19s which it takes to run the loop if I do not close and reopen the workers in between the runs.

So it looks like as an overhead that prevents me from running parfor loop faster, but I don't see how...

matlabUser on 31 Jan 2012

... And, unfortunately, this does not solve my problem, as opening/closing matlabpool is a time-consuming operation if you repeat it every run. So, it has to be investigated further...

Sign in to comment.

Sign in to answer this question.

Answer 1

Jason Ross on 26 Jan 2012

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/27178-fft-inside-parfor-loop-on-a-multi-core-computer-does-not-accelerate#answer_35377

Is your core count really 12, or are those hyper-threaded cores? If they are hyperthreaded, try with actual number of cores.

Also, take a look at your memory utilization. Are you swapping?

2 Comments
Show NoneHide None

matlabUser on 26 Jan 2012

Hi Jason,

I have actual 12 cores: 6 dual-core processors, not hyper-threaded, each Xeon X5690. My memory should be big enough, 24G, and this is a 64-bit Windows 7 with a 64-bit MATLAB. I don't think the system will be swapping having these resources.

Jason Ross on 26 Jan 2012

Great. I would concur that you shouldn't be swapping, but you should be able to verify with Task Manager. The CPU load points to the CPUs waiting on something or other, disk access was the first thing that popped into mind, especially as something may have accumulated that would consume memory.

Sign in to comment.

Answer 2

Konrad Malkowski on 1 Feb 2012

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/27178-fft-inside-parfor-loop-on-a-multi-core-computer-does-not-accelerate#answer_35894

Is your parfor code inside of a function or a script? If it is inside of a script, please convert it to a function call. In general the performance of functions is in general faster than the performance of scripts.

Also does your code include any clear statements inside of code called by parfor?

1 Comment
Show -1 older commentsHide -1 older comments

matlabUser on 1 Feb 2012

Hi,

my parfor is inside a function, not a script. I am not so sure what do you mean by "clear", but I am calling some external functions from within the parfor loop (which actually do FFT-s), but otherwise, I am using sliced input and output arrays inside parfor...

Sign in to comment.

FFT inside parfor loop on a multi-core computer does not accelerate

3 Comments
Show 1 older commentHide 1 older comment

Answers (2)

2 Comments
Show NoneHide None

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

FFT inside parfor loop on a multi-core computer does not accelerate

3 Comments Show 1 older commentHide 1 older comment

Answers (2)

2 Comments Show NoneHide None

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

3 Comments
Show 1 older commentHide 1 older comment

2 Comments
Show NoneHide None

1 Comment
Show -1 older commentsHide -1 older comments