Finding the length of the longest continuous streak of a same number for each row of a matrix
5 views (last 30 days)
Show older comments
Let A be a (possibly 3D) matrix of type uint16 and size [p,q,r] = size(A), whose rows are all sorted in an ascending way. I want to find for each row A(i,:,k) the length of the longest continuous streak of same number in it. For example, if
A = [1 2 3 4 4 4;
1 1 2 2 2 2];
is such a matrix, then my output should read
c = [3;4]
While q is small, i.e. , the number of rows p can be quite big, e.g. , and r should be reasonable (to be determined). The latter means that, if possible, I would like to simultaneously perform the routine in question on several 2D matrices of the same size stacked as a 3D matrix rather than loop through them one by one, provided the procedure allows for that and there is a performance benefit to it.
Here are 3 methods I've given some thought.
If a is just a row, then
% Method 1: works only for vectors, don't know how to vectorize it to A
b = max(diff(find([1,diff(a)~=0,1])));
does it, but I don't know how to generalize this to a 2D or 3D matrix A. The main issue is that find() only gives the linear indices, so one has to somehow compute max(diff()) , so to speak. So, instead, for a 3D matrix A I have the following two routines:
% Input:
p = 5000;
q = 10;
r = 10;
A = sort(randi(1000,p,q,r,'uint16'),2);
% Method 2: faster when p is relatively small
A_unique = A; % create a copy of A
true_col = true(p,1,r);
A_unique_positions_logical = [true_col,diff(A,1,2) > 0]; % logical index of unique 'representatives' of the values of each row
A_unique(~A_unique_positions_logical) = 65535; % the values of A are not too big and can be bounded from above safely (see below)
A_unique = permute(A_unique,[1 4 3 2]);
C = max(permute(sum(uint8(A == A_unique),2,'native'),[1 4 3 2]),[],2); % compares each row with its respective row of unique values
% counts the occurances of each unique value in this row and then takes max(), 65535 has no matches => only unique values are counted
But this routine is not very suitable when p is as big as in my use case. So, the routine I am currently using is via computing an incidence matrix:
% Input:
p = 500000;
q = 10;
r = 10;
A = sort(randi(1000,p,q,r,'uint16'),2);
% Method 3: faster when p is bigger
I_mat = VChooseK(uint8(1:q),2);
m = size(I,1); % i.e. m = nchoosek(q,2), which is small, since q <= 10
Incidence_mat = ones(p,q,r,q,'uint8');
for k=1:m % this is a relatively small loop, comparing all the 2-combinations of the columns of A
i = I_mat(k,1); j = I_mat(k,2);
A1_col = A(:,i,:);
A2_col = A(:,j,:);
Eq_tmp = uint8(A1_col == A2_col);
Incidence_mat(:,i,:,j) = Eq_tmp;
Incidence_mat(:,j,:,i) = Eq_tmp;
end
C = max(sum(Incidence_mat_gpu,4,'native'),[],2);
I am hoping that Method 1 can be adapted to 2-3D matrices in a way that is faster than the other two routines. Any help is much appreciated!
Lastly, let me mention that the vectorization routines are intended to be run on a GPU.
0 Comments
Accepted Answer
Bruno Luong
on 19 Feb 2023
Edited: Bruno Luong
on 19 Feb 2023
p = 3;
q = 10;
r = 2;
A = sort(randi(10,p,q,r,'uint16'),2)
B = reshape(permute(A,[2 1 3]),q,[]);
t = true(1,width(B));
[i,j] = find([t;diff(B,1,1);t]);
ll = accumarray(j,i,[],@(i)max(diff(i)));
ll = reshape(ll,p,r);
disp(ll)
4 Comments
Bruno Luong
on 19 Feb 2023
Edited: Bruno Luong
on 19 Feb 2023
This modification should work for gpu (and cpu)
p = 3;
q = 10;
r = 2;
A = sort(randi(10,p,q,r,'uint16'),2)
A = gpuArray(A);
B = reshape(permute(A,[2 1 3]),q,[]);
t = true(1,width(B));
[i,j] = find([t;diff(B,1,1);t]);
di = [-q; diff(i)];
ll = accumarray(j,di,[],@max);
ll = reshape(ll,p,r);
disp(ll)
More Answers (2)
John D'Errico
on 19 Feb 2023
When you have a question where there is some specific issue that you should know is a problem, TELL US ABOUT IT. Don't keep it a secret, only saying something when you get an answer. And needing to use a GPU, or the parallel computing TB in general is such an issue. Most people don't have it.
You have an array, and you want to find the longest constant run of ANY value in each row. For example...
A = randi(3,10,15)
Start by identifying ALL such runs. The simplest way to do this is via diff.
n = size(A,1);
Acon = [zeros(n,1),~diff(A,[],2)]
Any 1 in a row of Acon indicates a location where we found a sequence of constant elements in that row. Runs of ones indicate longer runs of constant elements. So now the problem reduces to finding the length of the longest run of ones in each row. Again though, here we have a problem of knowing what codes will work for you, using a GPU.
So, next, try this:
conv2(Acon,[1 2 1])
What you will see is
max(conv2(Acon,[1 2 1]),[],2)
tells you the length of the longest sequence of constant elements in a row, as long as that sequence is not longer than 4. And for those rows where there are more then 4 elements in a run, it will report a run of 4's.
So if you expect that there would rarely be runs of length 5 or more, then you could just use the single line code above, then check each row where a 4 was reported, to see if there was actually a run of length 5 or more. And that is easy enough to check. But the nice thing is, the above code will be extremely fast.
3 Comments
John D'Errico
on 19 Feb 2023
Its hard to say. I have a funny feeling that with a creative use of conv, I could identify longer runs too. But I think that requires me to know the maximum run length.
Another idea is to use a run length encoding on the rows of your matrix, expecially once it has been converted to a binary form as I did with Acon. So if each row were run length encoded, then the problem reduces to finding the longest run of ones. But I'm not sure that would be directly GPU-able.
Or, you could use various image processing ideas. Again, not sure how well they would work for you. Regardless, I think Bruno has come up with a variation that will be GPU-able.
See Also
Categories
Find more on GPU Computing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!