Mean distance function upgrade question

4 views (last 30 days)
Chm
Chm on 31 Oct 2022
Commented: Jan on 1 Nov 2022
Dear Team,
The below code calculating the mean distance. For a few thousand points (x,y,z) the code is working fine, but when i input values as group1 = 70000 points and group2 = 80000 points the progress is too slow. What should i add/change in the below code to have optimal results ?
data = table2array(readtable("test.xlsx"));
group1 = length(data(~isnan(data(:,1))));
group2 = length(data(~isnan(data(:,5))));
tic
for i=1:group1
display(i);
minval = inf;
for j=1:group2
point(i,j) = sqrt((data(j,5)-data(i,1))^2+(data(j,6)-data(i,2))^2+(data(j,7)-data(i,3))^2);
if point(i,j)<minval
minval = point(i,j);
end
end
values(i) = minval;
end
avg = mean(values);
toc
Thanks in advance

Accepted Answer

Chm
Chm on 31 Oct 2022
Thanks a lot Team!
you are amazing!!

More Answers (2)

Torsten
Torsten on 31 Oct 2022
Edited: Torsten on 31 Oct 2022
Don't know if you have enough RAM for this. Note that the distance matrix pdist2(group1,group2) will be 70000 x 80000 in your case.
group1 = [1 3 -5; 2 -1 4; 3 4 90];
group2 = [0 4 7; 3 3 -56];
m = mean(min(pdist2(group1,group2).'))
m = 33.7672
  1 Comment
Chm
Chm on 31 Oct 2022
Edited: Chm on 31 Oct 2022
Thanks a lot Torsten for your prompt reply. I will check it and let you know. I have 32Gb

Sign in to comment.


Jan
Jan on 31 Oct 2022
Edited: Jan on 1 Nov 2022
data = table2array(readtable("test.xlsx"));
% group1 = length(data(~isnan(data(:,1)))); Faster:
group1 = nnz(~isnan(data(:,1)));
group2 = nnz(~isnan(data(:,5)));
tic
values = zeros(group1, 1); % Pre-allocate
for i = 1:group1
% Wastes time: display(i);
% Do you reall need the huge point(i,j) array? If not, collect the data
% in a scalar:
minval = inf;
for j = 1:group2
% Avoid the expensive SQRT at searching for the minimum:
point = (data(j,5)-data(i,1))^2 + ...
(data(j,6)-data(i,2))^2 + ...
(data(j,7)-data(i,3))^2;
if point < minval
minval = point;
end
end
values(i) = sqrt(minval); % One SQRT is enough
end
avg = mean(values);
toc
Vectorizing the inner loop is most likely faster:
point = (data(1:group2,5) - data(i,1))^2 + ...
(data(1:group2,6) - data(i,2))^2 + ...
(data(1:group2,7) - data(i,3))^2;
values(i) = sqrt(min(point)); % One SQRT is enough
Now avoid creating the submatrices repeatedly:
values = zeros(n, 1); % Pre-allocate!
A = data(:, 5:7);
B = data(:, 1:3);
for i = 1:n
point = sum((A - B(i, :)).^2, 2);
values(i) = sqrt(min(point)); % One SQRT is enough
end
avg = mean(values);
Compare this with the nice and clean PDIST method suggested by Torsten.
  3 Comments
Torsten
Torsten on 31 Oct 2022
Compare this with the nice and clean PDIST method suggested by Torsten.
Too memory-intensive if the goal are only the row minima.
I think your second suggestion is a good compromise.
Jan
Jan on 1 Nov 2022
Locally in my R2018b installation this is the fastest:
S = 0;
a5 = data(:, 5);
a6 = data(:, 6);
a7 = data(:, 7);
for i = 1:n % Faster with PARFOR!
p = (a5 - data(i, 1)).^2 + ...
(a6 - data(i, 2)).^2 + ...
(a7 - data(i, 3)).^2;
S = S + sqrt(min(p));
end
avg = S / n;

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!