Summing for loop speed-up by multiplication with unity.

2 views (last 30 days)
Hi
I was benchmarking the speed of for loops versus proper reshaped-matrix-multiplication in the context of tensor multiplication. Essentially I wanted to calculate a third order tensor fo_{k,l,m} defined as fo_{k,l,m}=sum_{n=1}^{nmax}pt_{k,n}*ft_{n,l,m} from two given objects pt and ft. As expected I found, that implementing this via reshaping and matrix multiplication is much faster compared with discrete for loops.
However, I found some odd behaviour in the speed of the discrete for loop implementation. On my machine this code takes 18secs:
nk=115;
nl=116;
nm=117;
ft=rand([nk,nl,nm]);
fo = zeros(nk,nl,nm);
pt=rand(nk);
tic
for cntk = 1 : nk
for cntl = 1:nl
for cntm = 1: nm
for cntsum = 1: nk
fo(cntk,cntl,cntm)=fo(cntk,cntl,cntm)+pt(cntk,cntsum)*ft(cntsum,cntl,cntm);
end
end
end
end
toc
Adding a factor of one in the innermost part of the loop brings the time down to only 4 secs:
nk=115;
nl=116;
nm=117;
ft=rand([nk,nl,nm]);
fo = zeros(nk,nl,nm);
pt=rand(nk);
tic
for cntk = 1 : nk
for cntl = 1:nl
for cntm = 1: nm
for cntsum = 1: nk
fo(cntk,cntl,cntm)=1*fo(cntk,cntl,cntm)+pt(cntk,cntsum)*ft(cntsum,cntl,cntm);
end
end
end
end
toc
Curiosity made me check the situation with non-nested loops with a scalar / vector. Here the speed-up is opposite. This takes .45 secs
a=0
ntest=100000000;
b=rand(1,ntest);
tic
for cnt = 1 : ntest
a = a + b(ntest);
end
toc
And this takes .75 secs:
a=0
ntest=100000000;
b=rand(1,ntest);
tic
for cnt = 1 : ntest
a = 1*a + b(ntest);
end
toc
I wonder what MATLAB (ver. 2015a) is doing differently during the execution of the two versions. Any ideas?
Kind regards
Zaph

Answers (1)

Jan
Jan on 2 Dec 2017
Edited: Jan on 2 Dec 2017
My timings under R2015b/64/Win7:
Elapsed time is 16.105212 seconds. % No "1*"
Elapsed time is 16.074966 seconds. % With "1*"
Elapsed time is 0.894157 seconds. % Faster method below
My timings under R2016b/64/Win7:
Elapsed time is 4.882229 seconds. % No "1*"
Elapsed time is 4.767648 seconds. % With "1*"
Elapsed time is 1.163444 seconds. % Faster method below
Obviously JIT acceleration has a strong effect on the runtime. It seems, like in your R2015a the JIT profits from the multiplication by 1 - for unknown reasons. The JIT is not documented and we could only speculate what's going on.
It is nice, that the naive loop runs 3 times faster in R2016b, but what a pity that the faster version gets 25% slower:
for cntk = 1 : nk
ptv = pt(cntk, :);
for cntl = 1:nl
fo(cntk, cntl, :) = ptv * reshape(ft(:, cntl, :), nk, nm);
end
end
  2 Comments
Zaphod
Zaphod on 2 Dec 2017
Edited: Zaphod on 2 Dec 2017
Hi Jan,
Thanks for picking up on this one. In terms of best speed I'm currently working with this line:
fo=reshape(pt*reshape(ft,[nk,nm*nl]),[nk,nl,nm]);
It's doing the trick in a few milliseconds (2015a,MacBook).
Your results with different MATLAB versions are very interesting and make me even more curious.
Thanks and have great day,
Zaph
Jan
Jan on 2 Dec 2017
fo=reshape(pt*reshape(ft,[nk,nm*nl]),[nk,nl,nm]);
This takes 0.1 sec on my machine.

Sign in to comment.

Categories

Find more on MATLAB in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!