Sliced variables in parfor loop - any chance to optimize?

I'm trying to simulate image simulation in lithographic systems and was lucky to find the book "Computational Lithography" which provides some m-files as a base. Now I'm trying to implement my own code and optimize it for CPU-parallel computing. After hours of (more less) trial and error I thought, I could try to ask somebody who's more acknowledged.
My code so far:
tic
index_x = repmat(1:N_mask^2,[N_mask^2 1]);
index_x = index_x(:);
index_y = repmat(1:N_mask^2, 1, N_mask^2)';
index_1=mod(index_x-1,N_mask)+1;
index_2=floor((index_x-1)/N_mask)+1;
index_3=mod(index_y-1,N_mask)+1;
index_4=floor((index_y-1)/N_mask)+1;
index_m = (mod(index_1-index_3,N_mask) + 1)';
index_n = (mod(index_2-index_4,N_mask) + 1)';
aerial=zeros(N_mask,N_mask);
aerial_fre=zeros(N_mask,N_mask);
pz_fre=(fftshift(fft2(pz)));
for idx=1:N_mask^4
aerial_fre(index_m(idx),index_n(idx))=aerial_fre(index_m(idx),index_n(idx))+...
TCC(index_x(idx),index_y(idx))*(pz_fre(index_1(idx),index_2(idx)))*...
conj(pz_fre(index_3(idx),index_4(idx)));
end
toc
My idea was to parallelize the for-loop as a parfor, but it always gives errors with "sliced variables", no matter which way I use it. I've also tried the following: http://www.mathworks.com/matlabcentral/answers/76684-how-do-i-implement-parfor-loop-with-nested-for-loops
Any ideas? Thank you very much in advance

 Accepted Answer

PARFOR might be unnecessary here. I think the best path to optimization of this is straight-up application of ACCUMARRAY:
[index_x, index_y] = ndgrid(1:N_mask^2);
[index_1,index_2] = ind2sub([N_mask,N_mask], index_x);
[index_3,index_4] = ind2sub([N_mask,N_mask], index_y);
index_m = (mod(index_1-index_3,N_mask) + 1).';
index_n = (mod(index_2-index_4,N_mask) + 1).';
vals=bsxfun(@times, TCC,pz_fre(:));
vals=bsxfun(@times,vals,pz_fre(:)');
idx=(vals~=0);
subs=[index_m(idx), index_n(idx)];
vals=vals(idx);
aerial_fre = accumarray(subs,vals,[N_mask,N_mask]);

4 Comments

You could in fact break the accumarray step into parallel chunks and use parfor, but whether this produces an advantage would need to be tested. The following outlines how this might be done using MAT2TILES ( Download ),
idx=(vals~=0);
subs=[index_m(idx), index_n(idx)];
vals=vals(idx);
p=gcp;
subsCell=mat2tiles(subs, p.NumWorkers, 2);
valsCell=mat2tiles(vals, p.NumWorkers, 1);
parfor i=1:p.NumWorkers
aerial_fre = aerial_fre + ...
accumarray(subsCell{i},valsCell{i},[N_mask,N_mask]);
end
Thank you very much for your reply! I forgot to tell, that the TCC-matrix is usually a 4D-Matrix which is brought to 2D by reshape. It represents an optical system (like a microscope with source/objective) and acts as a amplitude/phase-transfer function. It's an integral over the source and objective's pupil + conj. pupil.
The TCC dimensions are
N_mask^2 x N_ask^2
and therefore it is possible to calculate it with the suggested bsxfun-approach, because the ps_fre is a
N_mask x N_mask -matrix.
I've tried to implement your thoughts somehow, but with no positive result. Anyway - it was possible to shrink the code from a four-fold for-loop to a 1D-for-loop which results into a computation time of about .5s at 51x51 mask in comparison to 3.5s in comparison to the old-version.
Thank you very much!
and therefore it is possible to calculate it with the suggested bsxfun-approach, because the ps_fre is a N_mask x N_mask matrix
Did you really mean to say that it is impossible? The matrix dimensions that you've mentioned were what I originally understood them to be. They shouldn't conflict with the bsxfun operations that I've shown. Note that I use pre_fre(:) which is an N_mask^2 x 1 column vector.
Great! Thank you very much for your help! I've made a mistake with the dimensions previously. There is an improvement of about 30% in computational time! Awesome! Thank you very much :)

Sign in to comment.

More Answers (0)

Asked:

on 29 Dec 2015

Commented:

on 1 Jan 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!