Speed of the abs function
Show older comments
Hello everybody!
I had a question about how matlab computed the absolute value function.
Indeed, i was working on a school project and had to optimize a code that returned the smallest euclidian distance from the origin of a 3-D cloud of points. Simple enough, my first thought was to use the basic :
d = x.^2 + y.^2 + z.^2;
dmin = sqrt(min(d));
But then, thinking that squaring was a expensive operation, especially on a data set that can be bilions of points big, i tried to implement a "pre-check". My idea was the following: the 1-norm will be way cheaper to compute and so i will be able to compute a first "estimate" of the smallest distance very quickly. I will then use this to only compute the 2-norm of the points that are "close enough" to the origin on a way smaller data set.
so i did this:
dEstimate = abs(x) + abs(y) + abs(z);
dMinEstimate = min(dEstimate);
threshold = dMinEstimate*sqrt(3); %the closest point in the 2-norm can be at most sqrt(3) times further than the closest in the 1-norm.
potentialMin = find(dEstimate < threshold);
d = x(potentialMin).^2 + y(potentialMin).^2 + y(potentialMin).^2
dmin = sqrt(min(d));
But to my surprise, when using the profile viewer to compare the codes, this is what i got:

As you can see, the results are averaged over 100 calls in order to reduce processing noise. Keep in mind that the data is one milion points here.
How can the square operation time be close or even faster than the absolute value?
I would assume that the absolute value is computed by setting to 0 the sign bit in the IEEE 754 double convention, which is a very simple operation while the squaring is a much more complex process.
Is matlab this optimised? How can this be?
Thanks in advance for your answer !
Arthur
Accepted Answer
More Answers (2)
Chunru
on 5 Dec 2023
2 votes
" squaring was a expensive operation"
This is ture for VERY OLD cpu or some special purpose CPUs. In modern general purpose CPU, such as Intel one, the difference among square/multiplication/abs/addition is no longer significant (subjective here). For example, a CPU can perform 4 ADD/cycle or 4 MUL/cycle (floating point). The abs operation may not be much simpler than square.
If you consider the FP addition, abs, multiplier (of similar speed), with the overhead such memory access, loop control, the two approaches you used above may have ver similar performance. (It is also noted additional overhead for testing and comparison).
So in modern general purpose CPU, it may not worth to do the "optimization" mannually. (Some compilers may do optimisation better).
We have to watch out for measurement error due to Execution Engine processing
format long g
N = 10000;
x = randn(N, 1); y = randn(N,1); z = randn(N,1);
M = 10;
t1a = zeros(1,M);
t1b = zeros(1,M);
t2a = zeros(1,M);
t2b = zeros(1,M);
for K = 1 : M; start = tic; d = x.^2 + y.^2 + z.^2; t1a(K) = toc(start); end
for K = 1 : M; start = tic; d = abs(x) + abs(y) + abs(z); t2a(K) = toc(start); end
for K = 1 : M; start = tic; d = x.^2 + y.^2 + z.^2; t1b(K) = toc(start); end
for K = 1 : M; start = tic; d = abs(x) + abs(y) + abs(z); t2b(K) = toc(start); end
t1a
t1b
t2a
t2b
plot([t1a; t1b; t2a; t2b].')
legend({'squares#1', 'squares#2', 'abs#1', 'abs#2'})
plot([t1b; t2b].')
legend({'squares#2', 'abs#2'})
For reasons currently unknown, the abs() version consistently is slower at execution 3 or 4 compared to 2, but by the end of the run the timings for the two versions are virtually identical.
Categories
Find more on Parallel Computing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
