Finding the length of the longest continuous streak of a same number for each row of a matrix

Question

Marin Genov on 19 Feb 2023

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/1915125-finding-the-length-of-the-longest-continuous-streak-of-a-same-number-for-each-row-of-a-matrix

Edited: Marin Genov on 2 Mar 2023

Let A be a (possibly 3D) matrix of type uint16 and size [p,q,r] = size(A), whose rows are all sorted in an ascending way. I want to find for each row A(i,:,k) the length of the longest continuous streak of same number in it. For example, if

A = [1 2 3 4 4 4;

1 1 2 2 2 2];

is such a matrix, then my output should read

c = [3;4]

While q is small, i.e.

, the number of rows p can be quite big, e.g.

, and r should be reasonable (to be determined). The latter means that, if possible, I would like to simultaneously perform the routine in question on several 2D matrices of the same size stacked as a 3D matrix rather than loop through them one by one, provided the procedure allows for that and there is a performance benefit to it.

Here are 3 methods I've given some thought.

If a is just a row, then

% Method 1: works only for vectors, don't know how to vectorize it to A

b = max(diff(find([1,diff(a)~=0,1])));

does it, but I don't know how to generalize this to a 2D or 3D matrix A. The main issue is that find() only gives the linear indices, so one has to somehow compute max(diff())

, so to speak. So, instead, for a 3D matrix A I have the following two routines:

% Input:

p = 5000;

q = 10;

r = 10;

A = sort(randi(1000,p,q,r,'uint16'),2);

% Method 2: faster when p is relatively small

A_unique = A; % create a copy of A

true_col = true(p,1,r);

A_unique_positions_logical = [true_col,diff(A,1,2) > 0]; % logical index of unique 'representatives' of the values of each row

A_unique(~A_unique_positions_logical) = 65535; % the values of A are not too big and can be bounded from above safely (see below)

A_unique = permute(A_unique,[1 4 3 2]);

C = max(permute(sum(uint8(A == A_unique),2,'native'),[1 4 3 2]),[],2); % compares each row with its respective row of unique values

% counts the occurances of each unique value in this row and then takes max(), 65535 has no matches => only unique values are counted

But this routine is not very suitable when p is as big as in my use case. So, the routine I am currently using is via computing an incidence matrix:

% Input:

p = 500000;

q = 10;

r = 10;

A = sort(randi(1000,p,q,r,'uint16'),2);

% Method 3: faster when p is bigger

I_mat = VChooseK(uint8(1:q),2);

m = size(I,1); % i.e. m = nchoosek(q,2), which is small, since q <= 10

Incidence_mat = ones(p,q,r,q,'uint8');

for k=1:m % this is a relatively small loop, comparing all the 2-combinations of the columns of A

i = I_mat(k,1); j = I_mat(k,2);

A1_col = A(:,i,:);

A2_col = A(:,j,:);

Eq_tmp = uint8(A1_col == A2_col);

Incidence_mat(:,i,:,j) = Eq_tmp;

Incidence_mat(:,j,:,i) = Eq_tmp;

end

C = max(sum(Incidence_mat_gpu,4,'native'),[],2);

I am hoping that Method 1 can be adapted to 2-3D matrices in a way that is faster than the other two routines. Any help is much appreciated!

Lastly, let me mention that the vectorization routines are intended to be run on a GPU.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Bruno Luong on 19 Feb 2023

1
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/1915125-finding-the-length-of-the-longest-continuous-streak-of-a-same-number-for-each-row-of-a-matrix#answer_1174865

Edited: Bruno Luong on 19 Feb 2023

Open in MATLAB Online

p = 3;
q = 10;
r = 2;
A = sort(randi(10,p,q,r,'uint16'),2)
A = 3×10×2 uint16 array
A(:,:,1) =

    1    1    3    6    6    6    6    7    7   10
    2    4    6    6    7    7    7    8    9   10
    1    2    3    4    4    5    5    6    6    8


A(:,:,2) =

    1    1    4    4    5    6    6    7    8   10
    1    1    1    3    3    3    4    4    6   10
    1    1    1    2    3    4    4    4    9   10
B = reshape(permute(A,[2 1 3]),q,[]);
t = true(1,width(B));
[i,j] = find([t;diff(B,1,1);t]);
ll = accumarray(j,i,[],@(i)max(diff(i)));
ll = reshape(ll,p,r);
disp(ll)
     4     2
     3     3
     2     3

4 Comments
Show 2 older commentsHide 2 older comments

Bruno Luong on 19 Feb 2023

Edited: Bruno Luong on 19 Feb 2023

Open in MATLAB Online

This modification should work for gpu (and cpu)

p = 3;
q = 10;
r = 2;
A = sort(randi(10,p,q,r,'uint16'),2)
A = gpuArray(A);
B = reshape(permute(A,[2 1 3]),q,[]);
t = true(1,width(B));
[i,j] = find([t;diff(B,1,1);t]);
di = [-q; diff(i)];
ll = accumarray(j,di,[],@max);
ll = reshape(ll,p,r);
disp(ll)

Marin Genov on 19 Feb 2023

Thanks! I can confirm that the modification works on GPU.

Sign in to comment.

Answer 2

John D'Errico on 19 Feb 2023

1
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/1915125-finding-the-length-of-the-longest-continuous-streak-of-a-same-number-for-each-row-of-a-matrix#answer_1174890

Open in MATLAB Online

When you have a question where there is some specific issue that you should know is a problem, TELL US ABOUT IT. Don't keep it a secret, only saying something when you get an answer. And needing to use a GPU, or the parallel computing TB in general is such an issue. Most people don't have it.

You have an array, and you want to find the longest constant run of ANY value in each row. For example...

A = randi(3,10,15)
A = 10×15
   1     3     2     2     3     3     3     1     3     2     1     3     3     3
   3     3     2     3     2     1     3     1     1     2     3     3     3     2
   2     1     1     3     1     3     2     3     3     1     1     1     2     3
   1     3     1     3     1     3     3     1     3     3     1     3     1     2
   1     2     1     3     2     2     2     2     1     1     3     2     2     3
   3     3     3     3     3     2     2     3     2     1     2     2     2     3
   1     3     1     1     2     3     3     3     2     2     2     1     3     3
   1     1     3     3     1     3     1     3     1     3     3     1     2     1
   1     3     2     2     3     3     3     1     2     2     2     3     1     2
   1     3     3     2     1     2     1     3     2     2     2     2     2     1

Start by identifying ALL such runs. The simplest way to do this is via diff.

n = size(A,1);
Acon = [zeros(n,1),~diff(A,[],2)]
Acon = 10×15
   0     0     0     1     0     1     1     0     0     0     0     0     1     1
   0     1     0     0     0     0     0     0     1     0     0     1     1     0
   0     0     1     0     0     0     0     0     1     0     1     1     0     0
   0     0     0     0     0     0     1     0     0     1     0     0     0     0
   0     0     0     0     0     1     1     1     0     1     0     0     1     0
   1     1     1     1     1     0     1     0     0     0     0     1     1     0
   0     0     0     1     0     0     1     1     0     1     1     0     0     1
   0     1     0     1     0     0     0     0     0     0     1     0     0     0
   0     0     0     1     0     1     1     0     0     1     1     0     0     0
   1     0     1     0     0     0     0     0     0     1     1     1     1     0

Any 1 in a row of Acon indicates a location where we found a sequence of constant elements in that row. Runs of ones indicate longer runs of constant elements. So now the problem reduces to finding the length of the longest run of ones in each row. Again though, here we have a problem of knowing what codes will work for you, using a GPU.

So, next, try this:

conv2(Acon,[1 2 1])
ans = 10×17
   0     0     0     1     2     2     3     3     1     0     0     0     1     3     3     1
   0     1     2     1     0     0     0     0     1     2     1     1     3     3     1     0
   0     0     1     2     1     0     0     0     1     2     2     3     3     1     0     0
   0     0     0     0     0     0     1     2     1     1     2     1     0     0     0     0
   0     0     0     0     0     1     3     4     3     2     2     1     1     2     1     0
   1     3     4     4     4     3     2     2     1     0     0     1     3     3     1     0
   0     0     0     1     2     1     1     3     3     2     3     3     1     1     2     1
   0     1     2     2     2     1     0     0     0     0     1     2     1     0     0     0
   0     0     0     1     2     2     3     3     1     1     3     3     1     0     0     0
   1     2     2     2     1     0     0     0     0     1     3     4     4     3     1     0

What you will see is

max(conv2(Acon,[1 2 1]),[],2)
ans = 10×1
     3
     3
     3
     2
     4
     4
     3
     2
     3
     4

tells you the length of the longest sequence of constant elements in a row, as long as that sequence is not longer than 4. And for those rows where there are more then 4 elements in a run, it will report a run of 4's.

So if you expect that there would rarely be runs of length 5 or more, then you could just use the single line code above, then check each row where a 4 was reported, to see if there was actually a run of length 5 or more. And that is easy enough to check. But the nice thing is, the above code will be extremely fast.

3 Comments
Show 1 older commentHide 1 older comment

John D'Errico on 19 Feb 2023

Its hard to say. I have a funny feeling that with a creative use of conv, I could identify longer runs too. But I think that requires me to know the maximum run length.

Another idea is to use a run length encoding on the rows of your matrix, expecially once it has been converted to a binary form as I did with Acon. So if each row were run length encoded, then the problem reduces to finding the longest run of ones. But I'm not sure that would be directly GPU-able.

Or, you could use various image processing ideas. Again, not sure how well they would work for you. Regardless, I think Bruno has come up with a variation that will be GPU-able.

Marin Genov on 19 Feb 2023

Edited: Marin Genov on 19 Feb 2023

Your idea about using convolution is very interesting, especially since it's orders of magnitude faster than all of the other alternatives. So I will try to think about it some more and see if I can get anywhere for the general case. At any rate, the maximum run length has a trivial, but attainable upper bound, namely the width of the matrix, and this cannot be changed without padding the matrix with further zero columns.

Sign in to comment.

Answer 3

Marin Genov on 2 Mar 2023

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/1915125-finding-the-length-of-the-longest-continuous-streak-of-a-same-number-for-each-row-of-a-matrix#answer_1183650

I had a bit of fever and remembered the following little trick to vectorize find row-wise that should have been obvious from the start :-) Here is the full function:

% Input:

p = 500000;

q = 10;

r = 25;

A = sort(randi(1000,p,q,r,'uint16'),2);

% New Method:

tic;

q_vec = uint16(1:q+1);

Ones_col = ones(p,1,r,'uint16');

A_diff = uint16(logical(diff(A,1,2))); % length of run of 0-s = difference of the positions of the 1-s bounding them

A_diff_padded = [Ones_col,A_diff,Ones_col]; clear A_diff; % making sure there is always a boundary of ones

A_pos = A_diff_padded.*q_vec; clear A_diff_padded q_vec; % 1-s are replaced by their positions

A_pos_cumax = cummax(A_pos,2); clear A_pos; % the positions of boundary 1-s become neighbours

A_pos_cumax_diff = diff(A_pos_cumax,1,2); clear A_pos_cumax; % difference of the positions of 1-s

Count_col = max(A_pos_cumax_diff,[],2); clear A_pos_cumax_diff;

toc;

This is much faster than all of the above general methods, and in terms of performance it seems on par with the convolution method, except it's not limited to runs of 4 or less elements. Basically, find is replaced by a multiplication with 1:w, where w is the width of the 3D matrix, and a cummax. I also added a few clear because I ran out of memory before I had run out of performance.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Finding the length of the longest continuous streak of a same number for each row of a matrix

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

4 Comments
Show 2 older commentsHide 2 older comments

More Answers (2)

3 Comments
Show 1 older commentHide 1 older comment

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Finding the length of the longest continuous streak of a same number for each row of a matrix

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

4 Comments Show 2 older commentsHide 2 older comments

More Answers (2)

3 Comments Show 1 older commentHide 1 older comment

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

4 Comments
Show 2 older commentsHide 2 older comments

3 Comments
Show 1 older commentHide 1 older comment

0 Comments
Show -2 older commentsHide -2 older comments