How to exctract specific elements from .txt file?

1 view (last 30 days)
I have one .txt file with 10 columns. In columns 5,6 & the same elements are included.
Considering that, I would like to extract from this input all the rows that include same elements in columns 5,6 & 8 that might be repeated more than 2 times in the table array.
I tried unique and ismember commands, but I cannot syntax the suitable command.
For example:
Input :
4 65 7.5 MGK 4 5 66 MARKUS 46.63641 65
11 26 5 LEON 78 93 88 LUTHER 50.3554 5
6 2 6.5 NDSGN 4 5 77 MAKRUS 59.67196 3
6 8 4.5 ANGEL 24 23 99 JOHN 31.87303 -1
6 2 6.5 NDSGN 56 89 100 ALEX 100 0
9 6 4 MARY 56 89 100 ALEX 2 200
Output file:
4 65 7.5 MGK 4 5 66 MARKUS 46.63641 65
6 2 6.5 NDSGN 4 5 77 MAKRUS 59.67196 3
6 2 6.5 NDSGN 56 89 100 ALEX 100 0
9 6 4 MARY 56 89 100 ALEX 2 200
Could you please help me?

Answers (1)

dpb
dpb on 4 Apr 2023
d= [' 4; 65; 7.5; MGK ; 4; 5; 66; MARKUS; 46.63641; 65'
'11; 26; 5 ; LEON ; 78; 93; 88; LUTHER; 50.3554 ; 5'
' 6; 2; 6.5; NDSGN; 4; 5; 77; MAKRUS; 59.67196; 3'
' 6; 8; 4.5; ANGEL; 24; 23; 99; JOHN ; 31.87303; -1'
' 6; 2; 6.5; NDSGN; 56; 89; 100; ALEX; 100 ; 0'
' 9; 6; 4 ; MARY ; 56; 89; 100; ALEX; 2 ; 200'];
t=array2table(strtrim(split(string(d),';')));
t=convertvars(t,[1:3 5:7 9:10],'double')
t = 6×10 table
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 ____ ____ ____ _______ ____ ____ ____ ________ ______ _____ 4 65 7.5 "MGK" 4 5 66 "MARKUS" 46.636 65 11 26 5 "LEON" 78 93 88 "LUTHER" 50.355 5 6 2 6.5 "NDSGN" 4 5 77 "MAKRUS" 59.672 3 6 8 4.5 "ANGEL" 24 23 99 "JOHN" 31.873 -1 6 2 6.5 "NDSGN" 56 89 100 "ALEX" 100 0 9 6 4 "MARY" 56 89 100 "ALEX" 2 200
[g,idx]=findgroups(t(:,[5 6]));
n=histc(g,unique(g));
[ia,ib]=ismember(g,find(n>1));
tU=t(find(ib),:)
tU = 4×10 table
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 ____ ____ ____ _______ ____ ____ ____ ________ ______ _____ 4 65 7.5 "MGK" 4 5 66 "MARKUS" 46.636 65 6 2 6.5 "NDSGN" 4 5 77 "MAKRUS" 59.672 3 6 2 6.5 "NDSGN" 56 89 100 "ALEX" 100 0 9 6 4 "MARY" 56 89 100 "ALEX" 2 200
NOTA BENE:
  1. The description is inconsistent; none are extant in the sample dataset with >2 repeats so used 2 as the lower count
  2. Used only 5&6 as the grouping variables even though one place says "5,6 & 8*' because
  3. "MARKUS" and "MAKRUS" don't match in original dataset; not clear if this is just a typo or supposed to be real. If just a typo, if corrected, then 5,6,&8 would also produce two sets in the result.

Categories

Find more on Matrices and Arrays in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!