Find the range of duplicates in a sorted element

4 views (last 30 days)
So let's say I have a vector
a = [6 2 2 5]
I sort it with the function and now:
a = [2 2 5 6]
How do I find the range of the duplicate number(2)? Like, I want it to tell me the start of the duplicte(element1) and the end of the duplicates(element2)
An if I have [2 2 5 5 6]
It tells me copies are in 1-2 and 3-5

Accepted Answer

John D'Errico
John D'Errico on 30 Apr 2025
Edited: John D'Errico on 30 Apr 2025
I'll create a longer vector, with a few duplicates.
V0 = randi(8,[1,15])
V0 = 1×15
7 1 2 7 1 2 7 5 1 5 6 7 2 5 3
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
V = sort(V0)
V = 1×15
1 1 1 2 2 2 3 5 5 5 6 7 7 7 7
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
Now you want to know where the dups live, in the sorted vector. Just find the first and last elements of any dups. The trick is an old one that uses diff, and then a search for a specific pattern.
dV = diff(V) > 0
dV = 1x14 logical array
0 0 1 0 0 1 1 0 0 1 1 0 0 0
Hmm. That might be useful. Where a duplicate lives, we see a zero, since diff finds the difference between consecutive elements. And that means we just need to find the locations of the zero elements, and the first and last zero in a block of zeros. This means we can use a trick that employs strfind. Yes, I know, its not a string. Or, is it? strfind just looks for a desired pattern in a vector.
What does this tell us?
startloc = strfind([1,dV,1],[1 0])
startloc = 1×4
1 4 8 12
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
Do you see what I did? Appending a 1 at the beginning is a way to find blocks of zeros that start at the very beginning. Appending a 1 at the end allows us to find a block of zeros at the end.
Now, how about this?
endloc = strfind([1,dV,1],[0 1])
endloc = 1×4
3 6 10 15
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
Do you see how that worked? It identifed the duplicate blocks in V.
blocklength = endloc - startloc + 1
blocklength = 1×4
3 3 3 4
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
A set of useful tricks that are worth remembering, thus using diff and strfind. Don't forget to append those ones at each end though.

More Answers (1)

Thorsten
Thorsten on 30 Apr 2025
If you are just looking for pairs, you can use
b = sort(a);
startloc = find(diff(b) == 0);
endloc = startloc + 1;

Categories

Find more on Startup and Shutdown in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!