Find row in matrix, fast and with special conditions

1 view (last 30 days)
I have a matrix (S) and one row out of that matrix (s). I want to get the rownumber of (s) in (S).
(S) has unique rows. The row (s) is allways member of (S)-rows.
So far I calculate the row number with:
I = find(all(bsxfun(@eq,S,s),2));
I am looking for a faster method. I call this line alot, it costs me hours of calculation time.
Here's a test example:
% create example, N and n could change
n = 4; % n < 10
N = 25; % range(N) ~ [20 100]
S = nchoosek(1:N,n);
for i = 1:10
s = S(randi(size(S,1)),:);
tic
I = find(all(bsxfun(@eq,S,s),2));
toc
end
[Edit2], poorly expressed "(S) has unique rows". It should express that I is allways scalar.
[Edit], dimensional details.
My example shows the general direction of the sizes.
[d1 d2] = size(S); % its [12650x4] in my example
First dimension will be much larger then the second one. (d1 >> d2)
Dimension two (d2=n) will be smaller then 10. Numel(S) will be limited by Memory.
  1 Comment
Jan
Jan on 25 May 2013
What is the usual dimension of S? It matters if you are talking about 10x4, 10'000x20 or 20x10'000 matrices.

Sign in to comment.

Accepted Answer

Jan
Jan on 25 May 2013
Edited: Jan on 25 May 2013
If the first dimension of S is large, the BSXFUN approach checks of a lot of numbers although they are rejected in the former columns already. A small Matlab function can avoid testing already rejected rows:
function Index = FindRow(S, s)
nCol = size(S, 2);
match = S(:, 1) == s(1);
for col = 2:nCol
match(match) = S(match, col) == s(col);
end
Index = find(match);
For Image Analyst's example under Matlab 2009a/64/Win7:
S = randi(3, 200, 3);
s = [2 3 1];
tic, for k=1:1000; m = find(all(bsxfun(@eq,S,s),2)); end; toc
tic; for k=1:1000; m = find(ismember(S, s, 'rows')); end; toc
tic; for k=1:1000; m = FindRow(S, s); end; toc
% 0.032688 sec
% 0.433527 sec
% 0.021643 sec
But for S = randi(30, 2000, 20), s=S(1000, :):
% 0.124566 sec
% 3.175940 sec
% 0.194101 sec
[EDITED] It is easier in C to stop the search after the first matching row is found. This saves 50% processing time in the average. In addition no temporary arrays are required:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
double *S, *s;
mwSize n1, n2, i1, i2, k;
S = mxGetPr(prhs[0]);
n1 = mxGetM(prhs[0]);
n2 = mxGetN(prhs[0]);
s = mxGetPr(prhs[1]);
for (i1 = 0; i1 < n1; i1++) {
k = i1;
for (i2 = 0; i2 < n2; i2++) {
if (S[k] != s[i2]) {
break;
}
k += n1;
}
if (i2 == n2) { // Matching row found:
plhs[0] = mxCreateDoubleScalar((double) (i1 + 1));
return;
}
}
// No success:
plhs[0] = mxCreateDoubleScalar(mxGetNaN());
}
For S = randi(30, 2000, 20) I get (compiled by MSVC2008):
BSXFUN: 0.123305 sec
FindRow.m: 0.194914 sec
ISMEMBER: 3.172425 sec
FindRow.mex: 0.011098 sec (s is last row)
0.007880 sec (s is 1000th row)
0.004753 sec (s is first row)
The MEX has an average speedup of factor 15. For a [20000 x 200] matrix we get a factor of 260 already.
  3 Comments
Jan
Jan on 27 Dec 2013
Dear Jacob L,
The request you've sent me by email would have been a perfect question for this forum. Therefore I answer here, how to modify this function to work for UINT8 arrays:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
uint8_T *M, *v;
mwSize m, n, i1, i2, k;
if (nrhs != 2 || !mxIsUint8(prhs[0]) || !mxIsUint8(prhs[1])) {
mexErrMsgIdAndTxt("MEX:FindRow:BadInput",
"2 UINT8 arrays required as input.");
}
M = (uint8_T *) mxGetData(prhs[0]);
m = mxGetM(prhs[0]);
n = mxGetN(prhs[0]);
v = (uint8_T *) mxGetData(prhs[1]);
if (mxGetNumberOfElements(prhs[1]) != n) {
mexErrMsgIdAndTxt("MEX:FindRow:BadSize",
"Sizes of inputs do not match.");
}
for (i1 = 0; i1 < m; i1++) {
k = i1;
for (i2 = 0; i2 < n; i2++) {
if (M[k] != v[i2]) {
break;
}
k += m;
}
if (i2 == n) { // Matching row found:
plhs[0] = mxCreateDoubleScalar((double) (i1 + 1));
return;
}
}
// No success:
plhs[0] = mxCreateDoubleMatrix(0, 0, mxREAL);
}
Mark
Mark on 13 Mar 2015
Hi,
I used your your original mex file in matlab and tried to change it. I'm also searching for fast ways to find a row matrix s in a large matrix S. However the row matrix s appears a number of times in large matrix S.
if true
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[], mwSize, lastfound)
{
double *S, *s *lastfound;
mwSize n1, n2, i1, i2, k;
S = mxGetPr(prhs[0]);
n1 = mxGetM(prhs[0]);
// n2 = mxGetN(prhs[0]);
n2 = 3;
s = mxGetPr(prhs[1]);
for (i1 = lastfound; i1 < n1; i1++) {
k = i1;
for (i2 = 0; i2 < n2; i2++) {
if (S[k] != s[i2]) {
break;
}
k += n1;
}
if (i2 == n2) { // Matching row found:
plhs[0] = mxCreateDoubleScalar((double) (i1 + 1));
return;
}
}
// No success:
plhs[0] = mxCreateDoubleScalar(mxGetNaN());
}
end
So i tried to use another input 'lastfound' which would enable the code to start at lastfound + 1 (the +1 is added when the function is called), so that the previous row s is found, and the code start searching the second row s. I do this until the mex file return NaN. It however doesn't work, the code doesn't start the at the lastfound + 1 step. I am a C noob, so i think it is a very small error, but i would immensely appreciate it if somebody could take a look at it.
Thanks in advance :)

Sign in to comment.

More Answers (2)

Image Analyst
Image Analyst on 25 May 2013
Try this:
% Generate some sample data:
m = randi(3, 200, 3)
% Specify what we're looking for:
lookingFor = [2 3 1]; % Let's say we're looking for this
% Find out what rows that pattern occurs in:
rowsItOccursIn = find(ismember(m, lookingFor, 'rows'))
  5 Comments
Image Analyst
Image Analyst on 25 May 2013
What are the actual sizes of n, N, and S? For 4 and 25, it DEFINITELY should not take "hours of calculation time" like you said.
yosey
yosey on 26 May 2013
I've added dimensional details. High cpu time results due to multiple calls.

Sign in to comment.


Teja Muppirala
Teja Muppirala on 26 May 2013
Not sure if this is faster, but might be worth a try.
d1 = 12650;
d2 = 4;
S = randn(d1,d2);
s = S(randi(d1),:);
c = 1;
r = 1;
while c <= d2
if S(r,c) == s(c)
c = c+1;
else
r = r+1;
c = 1;
end
end
isequal(S(r,:),s)
If the big S is the same every time, you could make a better algorithm by sorting it in advance.
  1 Comment
yosey
yosey on 26 May 2013
Edited: yosey on 26 May 2013
The while loop is pretty fast, but cant beat the mex file. Here are the times for 1000 loops.
Elapsed time is 0.143702 seconds. % find(all(bsxfun(@eq,S,s),2))
Elapsed time is 0.059386 seconds. % findrow_mex
Elapsed time is 0.118347 seconds. % your while loop
Since I've got some trouble reading the C-code, isn't the mex-code the same as your while loop?

Sign in to comment.

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!