Improving the speed of a double for loop through vectorization or changing summation indices
2 views (last 30 days)
Show older comments
I'm writing a double for loop that computes a Pade summation subject to specific coefficients and parameters. I'm curious if there is a more efficient way to write my double for loop. I think that I should avoid writing both cos(theta)^(p-q) and sin(theta)^q as the power operation is expensive.
Here is my current implementation (and three other implementations that are slightly faster).
clc; clear all;
% Test data for function
epsilon = 0.3;
delta = 0.1;
N = 500;
M = 500;
% My code passes through complex coefficients of size (N+1,M+1)
c = rand(N+1,M+1) + 1i*rand(N+1,M+1);
rho = sqrt(epsilon^2 + delta^2);
theta = atan2(delta,epsilon);
% Form the ctilde
N_M_min = min(N,M);
coeff = zeros(N_M_min+1,1);
coeff2 = zeros(N_M_min+1,1);
coeff3 = zeros(N_M_min+1,1);
coeff4 = zeros(N_M_min+1,1);
ntests = 100; % number of test iterations to average exec time
% Method 1 - Original Implementation
tic
for N = 1:ntests
for p=0:N_M_min
for q=0:p
coeff(p+1) = coeff(p+1) + c(p-q+1,q+1)*cos(theta)^(p-q)*sin(theta)^q;
end
end
end
toc/ntests
% Method 2
tic
for N = 1:ntests
for p=0:N_M_min
c2 = 1;
for q=0:p
if (q == 0)
c2 = 1;
else
c2 = c2*sin(theta);
end
coeff2(p+1) = coeff2(p+1) + c(p-q+1,q+1)*cos(theta)^(p-q)*c2;
end
end
end
toc/ntests
% Method 3
tic
c1 = cos(theta).^(0:N_M_min);
c2 = sin(theta).^(0:N_M_min);
for N = 1:ntests
for p=0:N_M_min
for q=0:p
coeff3(p+1) = coeff3(p+1) + c(p-q+1,q+1)*c1(p-q+1)*c2(q+1);
end
end
end
toc/ntests
% Method 4 - Fastest but slightly less accurate than method 3
tic
for N = 1:ntests
vander = (cos(theta).^(0:N_M_min).') .* (sin(theta).^(0:N_M_min));
for p=0:N_M_min
for q=0:p
coeff4(p+1) = coeff4(p+1) + c(p-q+1,q+1)*vander(p-q+1,q+1);
end
end
end
toc/ntests
diff = norm(coeff-coeff2,inf); % is tiny, 3.3e-21
diff2 = norm(coeff-coeff3,inf); % is zero
diff3 = norm(coeff-coeff4,inf); % is tiny 3.5e-15 (not zero as in method 3)
My second implementation runs almost twice as fast locally. The third and fourth implementation increase the speed by around a factor of 4.
The average executive time for method 1 is 0.0589 seconds.
The average executive time for method 2 is 0.0392 seconds.
The average executive time for method 3 is 0.0186 seconds.
The average executive time for method 4 is 0.0116 seconds.
However, I don't know how to speed up the calculation for the coefficients cos(theta)^(p-q). Is there a more efficient way of writing this double for loop through avoiding the power operation or using vectorization (my Matlab code will call this subroutine more than one hundred thousand times)?
0 Comments
Accepted Answer
David Hill
on 15 Jul 2021
Edited: David Hill
on 15 Jul 2021
What is rep doing? Seems to just be multiplying.
for rep = 1:10000
for p=0:N_M_min
for q=0:p
coeff(p+1) = coeff(p+1) + c(p-q+1,q+1)*cos(theta)^(p-q)*sin(theta)^q;%there is no rep in the equations
end
end
end
I believe all your nested loops can be consolidated as such:
epsilon = 0.3;
delta = 0.1;
N = 50;
M = 50;
c = rand(N+1,M+1) + 1i*rand(N+1,M+1);
rho = sqrt(epsilon^2 + delta^2);
theta = atan2(delta,epsilon);
N_M_min = min(N,M);
tic;
c=rot90(c);
coeff=10000*arrayfun(@(x)sum(diag(c,x-N_M_min)'.*cos(theta).^(x:-1:0).*sin(theta).^(0:x)),0:N_M_min);
toc;
This runs in about 0.003 s.
1 Comment
More Answers (0)
See Also
Categories
Find more on Loops and Conditional Statements in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!