Find the desired row in the matrix

2 views (last 30 days)
Chenglin Li
Chenglin Li on 24 Oct 2022
Commented: Jan on 25 Oct 2022
Hello! I have a matrix, the first three rows are the x, y, and z coordinates of the points, the fourth row is the sum of the first three columns, and the fifth row is the product of the first three columns. I want to extract the index of the number of rows that occur only once in the matrix, for example, the sum of the first row is 90, the product is 26040, they are unique, so I extract it; If it's line 10 and line 22, only the sum is the same, but the product is different and they're extracted separately; If you have rows 55 and 56, the sum and the product are the same, then you only need to extract one row of data.
Can anyone help me with this as I'm completely new with MATLAB. I would be grateful.
  3 Comments
Chenglin Li
Chenglin Li on 24 Oct 2022
Well, thank you for your answer. Thank you very much!
Jan
Jan on 24 Oct 2022
@Rik: I thought of unqiue or histcounts also, but did not found a solution. Please check my answer. I'd be glad to see a less twiddling solution.

Sign in to comment.

Accepted Answer

Jan
Jan on 24 Oct 2022
Edited: Jan on 24 Oct 2022
While removing multiple rows is easy using the unique(x, 'rows'), I did not find a built-in functions to identify the vectors, which occur once only.
If the data set is small (some hundrets of rows), a nested loop is fine:
% Remove rows from M, which columns 4:5 are not occurring once only:
% Assuming than M is your matrix:
M = [rand(6, 3), [1,2; 2,3; 1,2; 4,2; 1,5; 1,2]];
% ^ just some stuff
off = false; % Slightly faster
A = M(:, 4:5); % Columns used for comparison
nA = size(A, 1); % Number of rows
T = true(nA, 1);
for iA = 1:nA
if T(iA) % If not excluded already
d = all(A(iA, :) == A, 2);
if sum(d) > 1 % More than 1 occurrence found
T(d) = off; % Mark all occurrences
end
end
end
Result = M(T, :) % Only rows, which occur once only
Result = 3×5
0.8932 0.9660 0.0837 2.0000 3.0000 0.0193 0.2833 0.5621 4.0000 2.0000 0.6671 0.1563 0.3340 1.0000 5.0000
A small acceleration is (most likely, test this using tic/toc) to test the columns separately:
A4 = M(:, 4); % Columns used for comparison
A5 = M(:, 5); % Columns used for comparison
nA = size(A4, 1); % Number of rows
T = true(nA, 1);
for iA = 1:nA
if T(iA) % If not excluded already
d = (A4(iA) == A4 & A5(iA) == A5);
if sum(d) > 1 % More than 1 occurrence found
T(d) = off; % Mark all occurrences
end
end
end
The costs for this nested loops grow with O(2), so the double size of the inputs needs 4 times longer to be processed. This gets very slow for huge data sets, e.g. with millions of rows. Then:
% Remove rows from M, which columns 4:5 are not occurring once only:
[A, idx] = sortrows(M(:, 4:5));
nextEq = [true; any(diff(A, 1, 1), 2)];
ini = strfind(nextEq.', [true, false]);
nextEq(ini) = false; % Mark 1st occurence in addition
T = false(size(A, 1), 1); % Pre-allocation, TRUE or FALSE doesn't matter
T(idx) = nextEq; % Original order
Result = M(T, :)
Result = 3×5
0.8932 0.9660 0.0837 2.0000 3.0000 0.0193 0.2833 0.5621 4.0000 2.0000 0.6671 0.1563 0.3340 1.0000 5.0000
  1 Comment
Chenglin Li
Chenglin Li on 25 Oct 2022
Thank you very much, this program has helped me a lot, let me have the next idea!!!

Sign in to comment.

More Answers (1)

Rik
Rik on 25 Oct 2022
Inspired by the answer and comment by Jan, I gave it a try as well. However, at least for this size, the answers from Jan are faster. Perhaps the functions I use would scale better, but I did not test that.
Perhaps accumarray would have a better performance than histcounts. If this is really a bottleneck in your code, you could try that.
% Assuming than M is your matrix:
M = [rand(6, 3), [1,2; 2,3; 1,2; 4,2; 1,5; 1,2]];
% ^ just some stuff
% Confirm the results match:
Jan_v1(M) , Rik(M)
ans = 3×5
0.7516 0.4447 0.5350 2.0000 3.0000 0.0193 0.3578 0.2807 4.0000 2.0000 0.1917 0.4433 0.3679 1.0000 5.0000
ans = 3×5
0.7516 0.4447 0.5350 2.0000 3.0000 0.0193 0.3578 0.2807 4.0000 2.0000 0.1917 0.4433 0.3679 1.0000 5.0000
% do warmup rounds first (only needed online), then test the timing for
% each implementation
for n=1:3,timeit(@()Jan_v1(M));timeit(@()Jan_v2(M));timeit(@()Jan_v3(M));timeit(@()Rik(M));end
timeit(@()Jan_v1(M)),timeit(@()Jan_v2(M)),timeit(@()Jan_v3(M)),timeit(@()Rik(M))
ans = 1.5764e-05
ans = 1.1022e-05
ans = 1.9246e-05
ans = 6.7721e-05
function output=Rik(M)
% Return the rows of the matrix for which the entries in the 4th and 5th column are unique.
% First create a temporary matrix that only contains the relevant columns.
A = M(:, 4:5);
% indA contains indices to A to create the unique list
% indB contains indices to the unique list to get back to A
% We need to use 'stable' to avoid sorting.
[~,indA,indB] = unique(A,'rows','stable');
% Count how often every index occurs
counts = histcounts(indB,0.5:(0.5+max(indB))); % create bin edges from 0.5 to 4.5
RowsWithOneOccurrence = indA(counts==1);
output = M(RowsWithOneOccurrence,:);
end
function Result=Jan_v1(M)
off = false; % Slightly faster
A = M(:, 4:5); % Columns used for comparison
nA = size(A, 1); % Number of rows
T = true(nA, 1);
for iA = 1:nA
if T(iA) % If not excluded already
d = all(A(iA, :) == A, 2);
if sum(d) > 1 % More than 1 occurrence found
T(d) = off; % Mark all occurrences
end
end
end
Result = M(T, :); % Only rows, which occur once only
end
function Result=Jan_v2(M)
off = false; % Slightly faster
A4 = M(:, 4); % Columns used for comparison
A5 = M(:, 5); % Columns used for comparison
nA = size(A4, 1); % Number of rows
T = true(nA, 1);
for iA = 1:nA
if T(iA) % If not excluded already
d = (A4(iA) == A4 & A5(iA) == A5);
if sum(d) > 1 % More than 1 occurrence found
T(d) = off; % Mark all occurrences
end
end
end
Result = M(T, :); % Only rows, which occur once only
end
function Result=Jan_v3(M)
% Remove rows from M, which columns 4:5 are not occurring once only:
[A, idx] = sortrows(M(:, 4:5));
nextEq = [true; any(diff(A, 1, 1), 2)];
ini = strfind(nextEq.', [true, false]);
nextEq(ini) = false; % Mark 1st occurence in addition
T = false(size(A, 1), 1); % Pre-allocation, TRUE or FALSE doesn't matter
T(idx) = nextEq; % Original order
Result = M(T, :);
end
  2 Comments
Chenglin Li
Chenglin Li on 25 Oct 2022
Thank you. I'll try again. Thank you very much indeed
Jan
Jan on 25 Oct 2022
Thanks, @Rik, for this comparison. While my loop versions have some speed advantages for tiny input, they are far to slow for large data. With
n = 1e6;
M1 = [rand(n, 3), randi([0, 1000], n, 2)]; % Few repeated values
M2 = [rand(n, 3), randi([0, 10], n, 2)]; % Many repeated vaues
for n=1:1, timeit(@()Jan_v3(M1));timeit(@()Rik(M1));end
timeit(@()Jan_v3(M1))
timeit(@()Rik(M1))
timeit(@()Jan_v3(M2))
timeit(@()Rik(M2))
Sorry, I hesitate to post the timings online, because they vary from run to run by 25% ! The difference between the 2 functions is smaller than this deviation between runs. My conclusion: Both have almost the same hight speed.
function output=Rik(M)
A = M(:, 4:5);
[~,indA,indB] = unique(A,'rows','stable');
counts = histcounts(indB,0.5:(0.5+max(indB))); % create bin edges from 0.5 to 4.5
RowsWithOneOccurrence = indA(counts==1);
output = M(RowsWithOneOccurrence,:);
end
function Result=Jan_v3(M)
[A, idx] = sortrows(M(:, 4:5));
nextEq = [true; diff(A(:, 1)) | diff(A(:, 2))];
% nextEq = [true; any(diff(A, 1, 1), 2)];
ini = strfind(nextEq.', [true, false]);
nextEq(ini) = false; % Mark 1st occurence in addition
T = false(size(A, 1), 1); % Pre-allocation, TRUE or FALSE doesn't matter
T(idx) = nextEq; % Original order
Result = M(T, :);
end

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!