Efficient Row comparison large dataset

I have 2 arrays: array 1 consists of 'a|b|c' rows and array 2 consist rows containing a string, for example 'ut'. Both arrays have a length of 200.000+ rows and have equal length.
From array 1, I need to filter values which have the same third value but a different first value for the same string in array 2.
for example: row 1: 'a|b|c' and 'x' row 2: 'f|g|c' and 'y' row 3: 'a|b|c' and 'y' row 4: 'd|e|c' and 'x'
In this situation i would like to delete row 4, because 'c' and 'x' are the same for both rows, but 'a' and 'd' are different. All other rows do not fit these demands and won't be deleted.
It is possible to write a for loop and compare each row separately, however this process takes days (I tested).
Any help would be greatly appreciated.

Answers (2)

help unique
This will serve your needs perfectly, and very efficiently.
Paul
Paul on 3 Nov 2014
Thank you for your fast answer.
Is it possible to use unique(..) to find the unique combinations of rows? Because then I would like to find unique combinations of the third value of array 1 ('a'|'b'|'this') and the the value of array 2 on the same row.
When I tried a simple set:
s = {'ut','rtd';'hg','ry';'ut','rtd'}
[r,i,j] = unique(s,'rows')
This gives the unique strings ('hg','rtd','ry','ut'), however I need the combinations (row1:'ut' and 'rtd', row2: 'hg' and 'ry')
Is this possible?

Categories

Asked:

on 3 Nov 2014

Answered:

on 3 Nov 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!