hi guys , i want to read a text file line by line and remove the lines which have NA and the duplicated columns

2 views (last 30 days)

Show older comments

chocho on 15 Feb 2017

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/325176-hi-guys-i-want-to-read-a-text-file-line-by-line-and-remove-the-lines-which-have-na-and-the-duplica

Edited: Walter Roberson on 20 Feb 2017

Accepted Answer: dpb

COADREAD_methylation.txt

Open in MATLAB Online

d = fopen('COADREAD_methylation.txt','r');
this_line=0;
all={};
while this_line~=-1
 % C= textscan( d, '%f%s'  ) ;
    this_line=fgetl(d);
   if this_line~=-1
       all=[all;this_line];
   end
end
fclose(d);

1 Comment
Show -1 older commentsHide -1 older comments

Stephen23 on 17 Feb 2017

Edited: Stephen23 on 17 Feb 2017

Accepted Answer

dpb on 15 Feb 2017

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/325176-hi-guys-i-want-to-read-a-text-file-line-by-line-and-remove-the-lines-which-have-na-and-the-duplica#answer_254913

Edited: dpb on 16 Feb 2017

Open in MATLAB Online

Well, 'NA' is easy, not sure what defines the repeated columns; not enough time at present to try to parse that input file to figure out what is/isn't unique without a description being supplied...

fid = fopen('COADREAD_methylation.txt','r');
data={};
while ~feof(fid)
  l=fgetl(fid);
  if isempty(strfind(l,'NA')), data=[data;{l}]; end
end
fid=fclose(fid);

If the presence of 'NA' is all that's needed to get all the offending records, then you're done; otherwise need more details on how to tell so folks here don't have to try to work it out on their own.

13 Comments
Show 11 older commentsHide 11 older comments

chocho on 20 Feb 2017

Edited: Walter Roberson on 20 Feb 2017

Open in MATLAB Online

hi friend, i want to make this code like this format

Note: i want to get every line and check if it has a NA remove it and get the second line, if not ckeck the columns of this line and see which column have ';' split this column and make 2 rows

fid = fopen('COADREAD_methylation.txt','r');
data={};
while ~feof(fid)
  l=fgetl(fid);   %get the lines
    if isempty(strfind(l,'NA')),  %remove NA rows
    else 
        %read next line
      idx=regexp(l,'\t','split');   %split the colmuns of this line which don't have NA and look for ';' in every column and split it 
      [nrow,ncol]=size(idx);  
           for i=1:ncol  
                 if idx(i)==';'  %look for columns which have ';'and split it 
                     split this column into 2 columns and put the second column
                     into a new row
                      %D = regexp(idx,';','split')
                      %l=[{l(1:idx-1)}; {[l(1:itab) l(idx+1:end)]}]; %split the line into 2
                 end
                     i=i+1;
           end
            save this line % this line will have no NA and if have ; will be splitted
      end
  end
  fid=fclose(fid);

chocho on 20 Feb 2017

Edited: Walter Roberson on 20 Feb 2017

Open in MATLAB Online

inputs:

Hybridization REF  TCGA-A6-2672-11A-01D-1551-05  TCGA-A6-2672-11A-01D-1551-05  TCGA-A6-2672-11A-01D-1551-05
Composite Element REF  Beta_value  Gene_Symbol  Chromosome  Genomic_Coordinate  Beta_value    Gene_Symbol
cg00000292  0.511852232819811  ATP2A1   16  28890100  0.787687855895422  ATP2A1
cg00002426  0.519102187746053  SLMAP    3  57743543  0.932889308560864  SLMAP
cg00006414  NA  "ZNF425;ZNF398"  7  148822837  NA  "ZNF425;ZNF398"  
cg00008493  0.987979722052904  "COX8C;KIAA1409"  14  93813777  0.986128428295584      "COX8C;KIAA1409"  
cg00011459  0.922491239231445  "TMEM186;PMM2"  16  8890425  0.961124285303233  "TMEM186;PMM2"

outputs:

Hybridization REF  TCGA-A6-2672-11A-01D-1551-05  TCGA-A6-2672-11A-01D-1551-05  TCGA-A6-2672-11A-01D-1551-05
cg00000292  0.511852232819811  ATP2A1   0.787687855895422  
cg00002426  0.519102187746053  SLMAP       0.932889308560864  
cg00008493  0.987979722052904  COX8C     0.986128428295584      
cg00008493  0.987979722052904  KIAA1409  0.986128428295584        
cg00011459  0.922491239231445  TMEM186  0.961124285303233  
cg00011459  0.922491239231445  PMM2                0.961124285303233

appreciate your help !

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

hi guys , i want to read a text file line by line and remove the lines which have NA and the duplicated columns

1 Comment
Show -1 older commentsHide -1 older comments

Accepted Answer

13 Comments
Show 11 older commentsHide 11 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

hi guys , i want to read a text file line by line and remove the lines which have NA and the duplicated columns

1 Comment Show -1 older commentsHide -1 older comments

Accepted Answer

13 Comments Show 11 older commentsHide 11 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

13 Comments
Show 11 older commentsHide 11 older comments