Importdata does not import whole .txt file

18 views (last 30 days)
I'm encountering a problem importing a .txt file containing mass spectrometry data. Somewhere along the way it just stops importing the remaining part of my .txt file (in total over 777.000 lines)
The data I'm trying to import describes a scan event from the mass spectrometer. The header (from 'BEGIN IONS' to 'SCANS=scannumber') describes certain properties of the scan event. The numbers between 'SCANS=scannumber' and 'END IONS' describe the spectrum (the actual data, but worthless without the header); the first column being m/z values , the second ion intensities .
My data looks lik this (this is one scan event):
BEGIN IONS
TITLE=Spectrum2667 scans: 5993,
PEPMASS=897.52844 17418.17383
CHARGE=2+
RTINSECONDS=3127
SCANS=5993
176.86790 128.299
181.97141 139.498
221.90227 139.841
341.23862 982.842
END IONS
I want to extract certain scan events based on their scannumber; another script tells me which ones to extract from this file. But, MATLAB is not importing all my scans. Therefore I am missing some of the data (and my other script gives an error, because the scan number it is looking for is not present).
I have just over 6000 of these scans in one .txt file. For some reason, MATLAB stops somewhere near the end of my file. At a certain scan event, it stops halfway the list describing the spectrum. The code I use to import the data is:
List(:,1) = importdata('MyData.txt');
Because I just need a list of all the scan events and write them to a new file after I have extracted the scan events that I want, it is of no importance to import the file in two columns or split the header etc; I just want the complete list all the way to the end of my .txt file.
I've looked in my .txt file, but there is no different space and/or tab format at this particular line in the .txt file.
If someone could help me solve my problem, I would be very happy.
Here is a dropbox link to https://www.dropbox.com/s/ijp5mvtvrm0ob9w/140708_LO_03_140710112412.txt it was too large to attach.
  4 Comments
Luuk van Oosten
Luuk van Oosten on 11 Jul 2014
https://www.dropbox.com/s/ijp5mvtvrm0ob9w/140708_LO_03_140710112412.txt
Don't know what went wrong...I'm sorry. here it is.

Sign in to comment.

Accepted Answer

Sara
Sara on 11 Jul 2014
I don't know what is wrong with importdata. This version will work. The size of k was based on your file, it may need to be changed if you change file.
k = cell(1332160,1);
j = 0;
fid = fopen('140708_LO_03_140710112412.txt','r');
while 1
t = fgetl(fid);
if(~ischar(t)),break,end
j = j + 1;
k{j} = t;
end
k = k(8:j-2);
  3 Comments
Sara
Sara on 15 Jul 2014
I thought you didn't need that part :) and the number was totally casual, just a big one.

Sign in to comment.

More Answers (3)

per isakson
per isakson on 15 Jul 2014
Edited: per isakson on 15 Jul 2014
"For some reason, MATLAB stops somewhere near the end of my file."
In Matlab, there is no high level function that reads and parses your text file, i.e. a file with repeated headers and blocks of data.
&nbsp
"[...]the actual data, but worthless without the header" .
I have a function, read_blocks_of_numerical_data, that reads only the actual data.
>> g=read_blocks_of_numerical_data('140708_LO_03_140710112412.txt',50);
>> whos g
Name Size Bytes Class Attributes
g 1x2142 21279808 cell
>> g{1234}
ans =
1.0e+06 *
0.0001 0.0023
0.0001 0.0022
0.0001 0.0022
.......
I attached the m-file. Somebody else might want to try it.
  1 Comment
Luuk van Oosten
Luuk van Oosten on 15 Jul 2014
Thank you for clearing that up! At this point I do not want to extract solely the actual data, but maybe your script can be of any use later in my project!

Sign in to comment.


Cedric Wannaz
Cedric Wannaz on 15 Jul 2014
Edited: Cedric Wannaz on 15 Jul 2014
Here is an alternate way based on regular expressions
content = fileread( '140708_LO_03_140710112412.txt' ) ;
pattern = ['TITLE=(?<title>[^\r\n]*)\s*', ...
'PEPMASS=(?<pepmass>[^\r\n]*)\s*', ...
'CHARGE=(?<charge>[^\r\n]*)\s*', ...
'RTINSECONDS=(?<rtinseconds>\d*)\s*', ...
'SCANS=(?<scans>\d*)\s*', ...
'(?<spectrum>[^E]*)'] ;
data = regexp( content, pattern, 'names' ) ;
for k = 1 : numel( data )
data(k).pepmass = sscanf( data(k).pepmass, '%f' )' ;
data(k).rtinseconds = sscanf( data(k).rtinseconds, '%d' ) ;
data(k).scans = sscanf( data(k).scans, '%d' ) ;
data(k).spectrum = sscanf( data(k).spectrum, '%f', [2, Inf] )' ;
end
Running this, you get e.g.
>> data
data =
1x2142 struct array with fields:
title
pepmass
charge
rtinseconds
scans
spectrum
>> data(1000)
ans =
title: 'Spectrum1000 scans: 3128,'
pepmass: [630.9374 2.4366e+05]
charge: '7+'
rtinseconds: 1987
scans: 3128
spectrum: [885x2 double]
>> select = [data.scans] > 5200 ;
>> data(select)
ans =
1x6 struct array with fields:
title
pepmass
charge
rtinseconds
scans
spectrum

Sanket Mishra
Sanket Mishra on 10 Jul 2014
Put importdata command into try and catch block and look for the exception that gets displayed. This might help you.
try
List = importdata();
catch ex
disp(ex);
end
I would suggest you to use textscan instead of importdata which is more suitable to your workflow. Please follow the below link to the documentation of textscan

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!