Find the desired row in the matrix

Question

Chenglin Li on 24 Oct 2022

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/1833643-find-the-desired-row-in-the-matrix

Commented: Jan on 25 Oct 2022

matrix.xlsx

Hello! I have a matrix, the first three rows are the x, y, and z coordinates of the points, the fourth row is the sum of the first three columns, and the fifth row is the product of the first three columns. I want to extract the index of the number of rows that occur only once in the matrix, for example, the sum of the first row is 90, the product is 26040, they are unique, so I extract it; If it's line 10 and line 22, only the sum is the same, but the product is different and they're extracted separately; If you have rows 55 and 56, the sum and the product are the same, then you only need to extract one row of data.

Can anyone help me with this as I'm completely new with MATLAB. I would be grateful.

3 Comments
Show 1 older commentHide 1 older comment

Chenglin Li on 24 Oct 2022

Well, thank you for your answer. Thank you very much！

Jan on 24 Oct 2022

@Rik: I thought of unqiue or histcounts also, but did not found a solution. Please check my answer. I'd be glad to see a less twiddling solution.

Sign in to comment.

Sign in to answer this question.

Answer 1

Jan on 24 Oct 2022

1
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/1833643-find-the-desired-row-in-the-matrix#answer_1081953

Edited: Jan on 24 Oct 2022

Open in MATLAB Online

While removing multiple rows is easy using the unique(x, 'rows'), I did not find a built-in functions to identify the vectors, which occur once only.

If the data set is small (some hundrets of rows), a nested loop is fine:

% Remove rows from M, which columns 4:5 are not occurring once only:
% Assuming than M is your matrix:
M = [rand(6, 3), [1,2; 2,3; 1,2; 4,2; 1,5; 1,2]];
%    ^ just some stuff
off = false;         % Slightly faster
A   = M(:, 4:5);     % Columns used for comparison
nA  = size(A, 1);    % Number of rows
T   = true(nA, 1);
for iA = 1:nA
  if T(iA)           % If not excluded already
     d = all(A(iA, :) == A, 2);
     if sum(d) > 1   % More than 1 occurrence found
        T(d) = off;  % Mark all occurrences
     end
  end
end
Result = M(T, :)     % Only rows, which occur once only
Result = 3×5
    0.8932    0.9660    0.0837    2.0000    3.0000
    0.0193    0.2833    0.5621    4.0000    2.0000
    0.6671    0.1563    0.3340    1.0000    5.0000

A small acceleration is (most likely, test this using tic/toc) to test the columns separately:

A4  = M(:, 4);       % Columns used for comparison
A5  = M(:, 5);       % Columns used for comparison
nA  = size(A4, 1);   % Number of rows
T   = true(nA, 1);
for iA = 1:nA
  if T(iA)           % If not excluded already
     d = (A4(iA) == A4 & A5(iA) == A5);
     if sum(d) > 1   % More than 1 occurrence found
        T(d) = off;  % Mark all occurrences
     end
  end
end

The costs for this nested loops grow with O(2), so the double size of the inputs needs 4 times longer to be processed. This gets very slow for huge data sets, e.g. with millions of rows. Then:

% Remove rows from M, which columns 4:5 are not occurring once only:
[A, idx]    = sortrows(M(:, 4:5));
nextEq      = [true; any(diff(A, 1, 1), 2)];
ini         = strfind(nextEq.', [true, false]);
nextEq(ini) = false;                 % Mark 1st occurence in addition
T           = false(size(A, 1), 1);  % Pre-allocation, TRUE or FALSE doesn't matter
T(idx)      = nextEq;                % Original order
Result      = M(T, :)
Result = 3×5
    0.8932    0.9660    0.0837    2.0000    3.0000
    0.0193    0.2833    0.5621    4.0000    2.0000
    0.6671    0.1563    0.3340    1.0000    5.0000

1 Comment
Show -1 older commentsHide -1 older comments

Chenglin Li on 25 Oct 2022

Thank you very much, this program has helped me a lot, let me have the next idea!!!

Sign in to comment.

Answer 2

Rik on 25 Oct 2022

1
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/1833643-find-the-desired-row-in-the-matrix#answer_1082688

Open in MATLAB Online

Inspired by the answer and comment by Jan, I gave it a try as well. However, at least for this size, the answers from Jan are faster. Perhaps the functions I use would scale better, but I did not test that.

Perhaps accumarray would have a better performance than histcounts. If this is really a bottleneck in your code, you could try that.

% Assuming than M is your matrix:
M = [rand(6, 3), [1,2; 2,3; 1,2; 4,2; 1,5; 1,2]];
%    ^ just some stuff
% Confirm the results match:
Jan_v1(M) , Rik(M)
ans = 3×5
    0.7516    0.4447    0.5350    2.0000    3.0000
    0.0193    0.3578    0.2807    4.0000    2.0000
    0.1917    0.4433    0.3679    1.0000    5.0000
ans = 3×5
    0.7516    0.4447    0.5350    2.0000    3.0000
    0.0193    0.3578    0.2807    4.0000    2.0000
    0.1917    0.4433    0.3679    1.0000    5.0000
% do warmup rounds first (only needed online), then test the timing for
% each implementation
for n=1:3,timeit(@()Jan_v1(M));timeit(@()Jan_v2(M));timeit(@()Jan_v3(M));timeit(@()Rik(M));end
timeit(@()Jan_v1(M)),timeit(@()Jan_v2(M)),timeit(@()Jan_v3(M)),timeit(@()Rik(M))
ans = 1.5764e-05
ans = 1.1022e-05
ans = 1.9246e-05
ans = 6.7721e-05
function output=Rik(M)
% Return the rows of the matrix for which the entries in the 4th and 5th column are unique.
% First create a temporary matrix that only contains the relevant columns.
A = M(:, 4:5);
% indA contains indices to A to create the unique list
% indB contains indices to the unique list to get back to A
% We need to use 'stable' to avoid sorting.
[~,indA,indB] = unique(A,'rows','stable');
% Count how often every index occurs
counts = histcounts(indB,0.5:(0.5+max(indB))); % create bin edges from 0.5 to 4.5
RowsWithOneOccurrence = indA(counts==1);
output = M(RowsWithOneOccurrence,:);
end
function Result=Jan_v1(M)
off = false;         % Slightly faster
A   = M(:, 4:5);     % Columns used for comparison
nA  = size(A, 1);    % Number of rows
T   = true(nA, 1);
for iA = 1:nA
  if T(iA)           % If not excluded already
     d = all(A(iA, :) == A, 2);
     if sum(d) > 1   % More than 1 occurrence found
        T(d) = off;  % Mark all occurrences
     end
  end
end
Result = M(T, :);    % Only rows, which occur once only
end
function Result=Jan_v2(M)
off = false;         % Slightly faster
A4  = M(:, 4);       % Columns used for comparison
A5  = M(:, 5);       % Columns used for comparison
nA  = size(A4, 1);   % Number of rows
T   = true(nA, 1);
for iA = 1:nA
  if T(iA)           % If not excluded already
     d = (A4(iA) == A4 & A5(iA) == A5);
     if sum(d) > 1   % More than 1 occurrence found
        T(d) = off;  % Mark all occurrences
     end
  end
end
Result = M(T, :);     % Only rows, which occur once only
end
function Result=Jan_v3(M)
% Remove rows from M, which columns 4:5 are not occurring once only:
[A, idx]    = sortrows(M(:, 4:5));
nextEq      = [true; any(diff(A, 1, 1), 2)];
ini         = strfind(nextEq.', [true, false]);
nextEq(ini) = false;                 % Mark 1st occurence in addition
T           = false(size(A, 1), 1);  % Pre-allocation, TRUE or FALSE doesn't matter
T(idx)      = nextEq;                % Original order
Result      = M(T, :);
end

2 Comments
Show NoneHide None

Chenglin Li on 25 Oct 2022

Thank you. I'll try again. Thank you very much indeed

Jan on 25 Oct 2022

Open in MATLAB Online

Thanks, @Rik, for this comparison. While my loop versions have some speed advantages for tiny input, they are far to slow for large data. With

n  = 1e6;
M1 = [rand(n, 3), randi([0, 1000], n, 2)];  % Few repeated values
M2 = [rand(n, 3), randi([0, 10], n, 2)];    % Many repeated vaues
for n=1:1, timeit(@()Jan_v3(M1));timeit(@()Rik(M1));end
timeit(@()Jan_v3(M1))
timeit(@()Rik(M1))
timeit(@()Jan_v3(M2))
timeit(@()Rik(M2))

Sorry, I hesitate to post the timings online, because they vary from run to run by 25% ! The difference between the 2 functions is smaller than this deviation between runs. My conclusion: Both have almost the same hight speed.

function output=Rik(M)
A = M(:, 4:5);
[~,indA,indB] = unique(A,'rows','stable');
counts = histcounts(indB,0.5:(0.5+max(indB))); % create bin edges from 0.5 to 4.5
RowsWithOneOccurrence = indA(counts==1);
output = M(RowsWithOneOccurrence,:);
end
function Result=Jan_v3(M)
[A, idx]    = sortrows(M(:, 4:5));
nextEq      = [true; diff(A(:, 1)) | diff(A(:, 2))];
% nextEq      = [true; any(diff(A, 1, 1), 2)];
ini         = strfind(nextEq.', [true, false]);
nextEq(ini) = false;                 % Mark 1st occurence in addition
T           = false(size(A, 1), 1);  % Pre-allocation, TRUE or FALSE doesn't matter
T(idx)      = nextEq;                % Original order
Result      = M(T, :);
end

Sign in to comment.

Find the desired row in the matrix

3 Comments
Show 1 older commentHide 1 older comment

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (1)

2 Comments
Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

Find the desired row in the matrix

3 Comments Show 1 older commentHide 1 older comment

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (1)

2 Comments Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

3 Comments
Show 1 older commentHide 1 older comment

1 Comment
Show -1 older commentsHide -1 older comments

2 Comments
Show NoneHide None