Mean distance function upgrade question
Show older comments
Dear Team,
The below code calculating the mean distance. For a few thousand points (x,y,z) the code is working fine, but when i input values as group1 = 70000 points and group2 = 80000 points the progress is too slow. What should i add/change in the below code to have optimal results ?
data = table2array(readtable("test.xlsx"));
group1 = length(data(~isnan(data(:,1))));
group2 = length(data(~isnan(data(:,5))));
tic
for i=1:group1
display(i);
minval = inf;
for j=1:group2
point(i,j) = sqrt((data(j,5)-data(i,1))^2+(data(j,6)-data(i,2))^2+(data(j,7)-data(i,3))^2);
if point(i,j)<minval
minval = point(i,j);
end
end
values(i) = minval;
end
avg = mean(values);
toc
Thanks in advance
Accepted Answer
More Answers (2)
Don't know if you have enough RAM for this. Note that the distance matrix pdist2(group1,group2) will be 70000 x 80000 in your case.
group1 = [1 3 -5; 2 -1 4; 3 4 90];
group2 = [0 4 7; 3 3 -56];
m = mean(min(pdist2(group1,group2).'))
1 Comment
data = table2array(readtable("test.xlsx"));
% group1 = length(data(~isnan(data(:,1)))); Faster:
group1 = nnz(~isnan(data(:,1)));
group2 = nnz(~isnan(data(:,5)));
tic
values = zeros(group1, 1); % Pre-allocate
for i = 1:group1
% Wastes time: display(i);
% Do you reall need the huge point(i,j) array? If not, collect the data
% in a scalar:
minval = inf;
for j = 1:group2
% Avoid the expensive SQRT at searching for the minimum:
point = (data(j,5)-data(i,1))^2 + ...
(data(j,6)-data(i,2))^2 + ...
(data(j,7)-data(i,3))^2;
if point < minval
minval = point;
end
end
values(i) = sqrt(minval); % One SQRT is enough
end
avg = mean(values);
toc
Vectorizing the inner loop is most likely faster:
point = (data(1:group2,5) - data(i,1))^2 + ...
(data(1:group2,6) - data(i,2))^2 + ...
(data(1:group2,7) - data(i,3))^2;
values(i) = sqrt(min(point)); % One SQRT is enough
Now avoid creating the submatrices repeatedly:
values = zeros(n, 1); % Pre-allocate!
A = data(:, 5:7);
B = data(:, 1:3);
for i = 1:n
point = sum((A - B(i, :)).^2, 2);
values(i) = sqrt(min(point)); % One SQRT is enough
end
avg = mean(values);
Compare this with the nice and clean PDIST method suggested by Torsten.
3 Comments
n = 3e4;
data = rand(n, 7);
tic
values = zeros(n, 1); % Pre-allocate!
for i = 1:n
minval = inf;
for j = 1:n
point = (data(j,5)-data(i,1))^2 + ...
(data(j,6)-data(i,2))^2 + ...
(data(j,7)-data(i,3))^2;
if point < minval
minval = point;
end
end
values(i) = sqrt(minval); % One SQRT is enough
end
avg = mean(values);
toc
tic
values = zeros(n, 1); % Pre-allocate!
for i = 1:n
point = (data(:,5) - data(i,1)).^2 + ...
(data(:,6) - data(i,2)).^2 + ...
(data(:,7) - data(i,3)).^2;
values(i) = sqrt(min(point)); % One SQRT is enough
end
avg = mean(values);
toc
tic
values = zeros(n, 1); % Pre-allocate!
A = data(:, 5:7);
B = data(:, 1:3);
for i = 1:n
point = sum((A - B(i, :)).^2, 2);
values(i) = sqrt(min(point)); % One SQRT is enough
end
avg = mean(values);
toc
tic
m = mean(min(pdist2(data(:, 5:7), data(:, 1:3))));
toc
Please check the timings by your own. I see strange effects in the forum currently.
Jan
on 1 Nov 2022
Locally in my R2018b installation this is the fastest:
S = 0;
a5 = data(:, 5);
a6 = data(:, 6);
a7 = data(:, 7);
for i = 1:n % Faster with PARFOR!
p = (a5 - data(i, 1)).^2 + ...
(a6 - data(i, 2)).^2 + ...
(a7 - data(i, 3)).^2;
S = S + sqrt(min(p));
end
avg = S / n;
Categories
Find more on Matrix Indexing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!