How can I optimize this recursive for loop?

Question

0 votes

Hi,

I have this recursive fo loop and I would like to know if there is some way in Matlab by which I could optimize the computation time:

tic
a = 2;
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
eps = 1e-3;
edt = 1e-2;
adt = 1e-4;
for i = 1:Nt-1
    tt = (1+t(i)-t(1:i))*ones(1,Nr);
    e1 = ones(i,1);
    ex = e1*x(i,:);
    ey = e1*y(i,:);
    exp1 = exp(-.25*((ex-x(1:i,:)).^2+(ey-y(1:i,:)).^2)./tt);
    x(i+1,:) = x(i,:) + adt*sum((ex-x(1:i,:))./tt.^2.*exp1) + edt*randn(1,Nr);
    y(i+1,:) = y(i,:) + adt*sum((ey-y(1:i,:))./tt.^2.*exp1) + edt*randn(1,Nr);
end
toc

Obviously I cannot use a parfor (because of the recursivity of this loop, and I tried to make all the vectors gpuArrays but the computation time seems ~20% longer (even without the gather step).

I was wondering if there were any other way to either make this vectorial or using the pdist function from the Statistics toolbox (or another one called dist but I forgot which toolbox is was from and I don't know the difference with pdist), or something on the GPU...?

Thank you for your help and ideas.

9 Comments
Show 7 older comments Hide 7 older comments

Sindar on 3 May 2020

Open in MATLAB Online

Sorry about the typos. There's also incorrectly "ex" in the y update line.

With another look, it also looks like tt can be improved. First off, you only use the inverse, so you could calculate that directly instead. Secondly, if I'm reading it correctly, tt has Nr identical columns as a set up for elementwise multiplication with ex/ey/etc.. This isn't actually necessary as Matlab understands that it should expand elementwise multiplication with length-1 dimensions. Try this:

tic
a = 2;
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
eps = 1e-3;
edt = 1e-2;
adt = 1e-4;
for i = 1:Nt-1
    tti = 1./(1+t(i)-t(1:i));
    ex = x(i,:)-x(1:i,:);
    ey = y(i,:)-y(1:i,:);
    exp1 = exp((-.25*(ex).^2+(ey).^2).*tti);
    x(i+1,:) = x(i,:) + adt*sum(ex.*tti.^2.*exp1) + edt*randn(1,Nr);
    y(i+1,:) = y(i,:) + adt*sum(ey.*tti.^2.*exp1) + edt*randn(1,Nr);
end
toc

A few general notes when working on optimization like this:

keep a copy of the original, highly-readable code. This example hasn't gotten too opaque yet, but they often do
check that things are actually giving the same results. Since you've got random numbers, you'll need to set the seed consistently (see here)
occasionally check on a problem at the actual size you want. Sometimes you'll find that the optimization you've worked so hard on is only a significant portion of the small case, and you've only gained 30% of 5% of the actual case
consider memory constraints. My experience is that Matlab is really good at intelligently optimizing halfway decent code, so long as your RAM isn't too small. This can lead to behavior you might not expect. For example, your code appears completely separable in the columns. If you've got plenty of RAM, doing all the calculations simultaneously is almost certainly faster. If you have too little, it might be (dramatically) faster to cut the problem into pieces that easily fit into memory.

Sindar on 7 May 2020

Open in MATLAB Online

It can sometimes get a little messy, but it may be fastest to generate the datasets in chunks (say, 10 at a time). I have not used parfor much, but I suspect that vectorized calculations may be faster than separate ones in parallel. Something like:

chunk_size = 10; % try to make this a divisor of Nr
tic
a = 2;
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
eps = 1e-3;
edt = 1e-2;
adt = 1e-4;
for ind=1:(Nr/chunk_size)
    idx = (ind-1)*chunk_size + (1:chunk_size);
    for i = 1:Nt-1
        tti = 1./(1+t(i)-t(1:i));
        ex = x(i,idx)-x(1:i,idx);
        ey = y(i,idx)-y(1:i,idx);
        exp1 = exp((-.25*(ex).^2+(ey).^2).*tti);
        x(i+1,idx) = x(i,idx) + adt*sum(ex.*tti.^2.*exp1) + edt*randn(1,Nr);
        y(i+1,idx) = y(i,idx) + adt*sum(ey.*tti.^2.*exp1) + edt*randn(1,Nr);
    end
end
toc

you can likely eyeball a good chunk_size by observing memory usage (CTRL-SHIFT-ESC on Windows) and trying to get it close to ~90% of what you think your RAM is (e.g., 32 GB. I don't know how it calculates the percent at the top, but the denominator isn't consistent)

caveat: I've successfully used this framework in a much more convoluted problem where each loop contains a large quantity of intermediate data (imagine exp1 ran up to or large than x). For your problem, it's easy to implement, so worth checking, but it may be entirely useless

LeChat on 2 Apr 2021

This has improved the performances of the computation. Thank you!

Jan on 2 Apr 2021

Edited: Jan on 2 Apr 2021

Open in MATLAB Online

The results between the originak version and the faster version of Sindar are different:

% Original:
exp1 = exp(-.25*((ex-x(1:i,:)).^2 + (ey-y(1:i,:)).^2) ./ tt);
% Faster:
exp1 = exp((-.25*(ex).^2 + (ey).^2).*tti)

In the 2nd version, the -0.25 is multiplied to the first term only. This reduces the runtime significantly, because the values are growing faster and EXP(Inf) is reached soon, which is much cheaper than finite calculation.

You need:

exp1 = exp((-0.25 * (ex.^2 + ey.^2) .* tti)

The last row of exp1 is 0, so it does not matter in the sum. Omitting it does not reduce the runtime significantly, although the number of EXP calls is reduced. See my answer.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Jan on 2 Apr 2021

Edited: Jan on 2 Apr 2021

Open in MATLAB Online

0 votes

tic
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
edt = 1e-2;
adt = 1e-4;
for i = 1:Nt-1
    tt = 1 ./ (1 + t(i) - t(1:i-1));
    ex = x(i, :) - x(1:i-1, :);
    ey = y(i, :) - y(1:i-1, :);
    exp1 = exp((-0.25 * tt) .* (ex.^2 + ey.^2)) .* tt.^2;
    x(i+1, :) = x(i, :) + adt * sum(ex .* exp1, 1) + edt * randn(1, Nr);
    y(i+1, :) = y(i, :) + adt * sum(ey .* exp1, 1) + edt * randn(1, Nr);
end
toc

This runs in 10 sec instead of the original 14 sec.

1 Comment
Show -1 older comments Hide -1 older comments

LeChat on 5 Apr 2021

Edited: LeChat on 5 Apr 2021

Open in MATLAB Online

Actually I ended up doing the following , following previous comments (and Sindar's advice):

tic
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
%
Fx = zeros(Nt,Nr);
Fy = zeros(Nt,Nr);
%
edt = 1e-2;
adt = 1e-4;
for i = 1:Nt-1
    tt = (1 + t(i) - t(1:i-1));
    ex = x(i, :) - x(1:i-1, :);
    ey = y(i, :) - y(1:i-1, :);
    exp1 = exp((-0.25./tt) .* (ex.^2 + ey.^2)) ./ tt.^2;
    Fx(i+1,:)=adt*sum(ex.*exp1,1);
    Fy(i+1,:)=adt*sum(ey.*exp1,1);
    x(i+1, :) = x(i, :) + Fx(i+1,:) + edt * randn(1, Nr);
    y(i+1, :) = y(i, :) + Fy(i+1,:) + edt * randn(1, Nr);
end
toc

It gives very similar performances as what you posted Jan, I don't know if doing the sum separately is very interesting (performance-wise)...?

Sign in to comment.

How can I optimize this recursive for loop?

9 Comments
Show 7 older comments Hide 7 older comments

Accepted Answer

1 Comment
Show -1 older comments Hide -1 older comments

More Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

How can I optimize this recursive for loop?

9 Comments Show 7 older comments Hide 7 older comments

Accepted Answer

1 Comment Show -1 older comments Hide -1 older comments

More Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

9 Comments
Show 7 older comments Hide 7 older comments

1 Comment
Show -1 older comments Hide -1 older comments