How can I optimize this recursive for loop?
Show older comments
Hi,
I have this recursive fo loop and I would like to know if there is some way in Matlab by which I could optimize the computation time:
tic
a = 2;
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
eps = 1e-3;
edt = 1e-2;
adt = 1e-4;
for i = 1:Nt-1
tt = (1+t(i)-t(1:i))*ones(1,Nr);
e1 = ones(i,1);
ex = e1*x(i,:);
ey = e1*y(i,:);
exp1 = exp(-.25*((ex-x(1:i,:)).^2+(ey-y(1:i,:)).^2)./tt);
x(i+1,:) = x(i,:) + adt*sum((ex-x(1:i,:))./tt.^2.*exp1) + edt*randn(1,Nr);
y(i+1,:) = y(i,:) + adt*sum((ey-y(1:i,:))./tt.^2.*exp1) + edt*randn(1,Nr);
end
toc
Obviously I cannot use a parfor (because of the recursivity of this loop, and I tried to make all the vectors gpuArrays but the computation time seems ~20% longer (even without the gather step).
I was wondering if there were any other way to either make this vectorial or using the pdist function from the Statistics toolbox (or another one called dist but I forgot which toolbox is was from and I don't know the difference with pdist), or something on the GPU...?
Thank you for your help and ideas.
9 Comments
Sindar
on 1 May 2020
might save some to predefine (ex-x(1:i,:), (ey-y(1:i,:) since they appear in both of the bottlenecks
tic
a = 2;
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
eps = 1e-3;
edt = 1e-2;
adt = 1e-4;
for i = 1:Nt-1
tt = (1+t(i)-t(1:i))*ones(1,Nr);
ex = (x(i,:)-x(1:i,:);
ey = (y(i,:)-y(1:i,:);
exp1 = exp(-.25*(ex).^2+(ey).^2)./tt);
x(i+1,:) = x(i,:) + adt*sum(ex./tt.^2.*exp1) + edt*randn(1,Nr);
y(i+1,:) = y(i,:) + adt*sum(ex./tt.^2.*exp1) + edt*randn(1,Nr);
end
toc
LeChat
on 3 May 2020
Rik
on 3 May 2020
If in the future you have an iterative problem of the shape x(n) =x(n-1)+f(n), you can compute the f(n) array and then use cumsum. In this case that is not possible because you have f(n,x).
Sindar
on 3 May 2020
Sorry about the typos. There's also incorrectly "ex" in the y update line.
With another look, it also looks like tt can be improved. First off, you only use the inverse, so you could calculate that directly instead. Secondly, if I'm reading it correctly, tt has Nr identical columns as a set up for elementwise multiplication with ex/ey/etc.. This isn't actually necessary as Matlab understands that it should expand elementwise multiplication with length-1 dimensions. Try this:
tic
a = 2;
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
eps = 1e-3;
edt = 1e-2;
adt = 1e-4;
for i = 1:Nt-1
tti = 1./(1+t(i)-t(1:i));
ex = x(i,:)-x(1:i,:);
ey = y(i,:)-y(1:i,:);
exp1 = exp((-.25*(ex).^2+(ey).^2).*tti);
x(i+1,:) = x(i,:) + adt*sum(ex.*tti.^2.*exp1) + edt*randn(1,Nr);
y(i+1,:) = y(i,:) + adt*sum(ey.*tti.^2.*exp1) + edt*randn(1,Nr);
end
toc
A few general notes when working on optimization like this:
- keep a copy of the original, highly-readable code. This example hasn't gotten too opaque yet, but they often do
- check that things are actually giving the same results. Since you've got random numbers, you'll need to set the seed consistently (see here)
- occasionally check on a problem at the actual size you want. Sometimes you'll find that the optimization you've worked so hard on is only a significant portion of the small case, and you've only gained 30% of 5% of the actual case
- consider memory constraints. My experience is that Matlab is really good at intelligently optimizing halfway decent code, so long as your RAM isn't too small. This can lead to behavior you might not expect. For example, your code appears completely separable in the columns. If you've got plenty of RAM, doing all the calculations simultaneously is almost certainly faster. If you have too little, it might be (dramatically) faster to cut the problem into pieces that easily fit into memory.
LeChat
on 7 May 2020
Sindar
on 7 May 2020
It can sometimes get a little messy, but it may be fastest to generate the datasets in chunks (say, 10 at a time). I have not used parfor much, but I suspect that vectorized calculations may be faster than separate ones in parallel. Something like:
chunk_size = 10; % try to make this a divisor of Nr
tic
a = 2;
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
eps = 1e-3;
edt = 1e-2;
adt = 1e-4;
for ind=1:(Nr/chunk_size)
idx = (ind-1)*chunk_size + (1:chunk_size);
for i = 1:Nt-1
tti = 1./(1+t(i)-t(1:i));
ex = x(i,idx)-x(1:i,idx);
ey = y(i,idx)-y(1:i,idx);
exp1 = exp((-.25*(ex).^2+(ey).^2).*tti);
x(i+1,idx) = x(i,idx) + adt*sum(ex.*tti.^2.*exp1) + edt*randn(1,Nr);
y(i+1,idx) = y(i,idx) + adt*sum(ey.*tti.^2.*exp1) + edt*randn(1,Nr);
end
end
toc
you can likely eyeball a good chunk_size by observing memory usage (CTRL-SHIFT-ESC on Windows) and trying to get it close to ~90% of what you think your RAM is (e.g., 32 GB. I don't know how it calculates the percent at the top, but the denominator isn't consistent)
caveat: I've successfully used this framework in a much more convoluted problem where each loop contains a large quantity of intermediate data (imagine exp1 ran up to or large than x). For your problem, it's easy to implement, so worth checking, but it may be entirely useless
LeChat
on 2 Apr 2021
The results between the originak version and the faster version of Sindar are different:
% Original:
exp1 = exp(-.25*((ex-x(1:i,:)).^2 + (ey-y(1:i,:)).^2) ./ tt);
% Faster:
exp1 = exp((-.25*(ex).^2 + (ey).^2).*tti)
In the 2nd version, the -0.25 is multiplied to the first term only. This reduces the runtime significantly, because the values are growing faster and EXP(Inf) is reached soon, which is much cheaper than finite calculation.
You need:
exp1 = exp((-0.25 * (ex.^2 + ey.^2) .* tti)
The last row of exp1 is 0, so it does not matter in the sum. Omitting it does not reduce the runtime significantly, although the number of EXP calls is reduced. See my answer.
Accepted Answer
More Answers (0)
Categories
Find more on Loops and Conditional Statements in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!