# How can I extract line numbers of text data?

69 views (last 30 days)
Hello everyone. I have attached a .txt file (portion.txt) which contains a portion of my data. What I need is to create a script which will identify strings that correspond to pairs of x-y coordinates and return their line numbers. For instance, in the .txt file the first set of coordinates begins at line 3 and ends at line 138 (the number of those pairs is written above each set of coordinates, which at this case is 136). So the script should return those two numbers. Then this process should be done for the whole file. I suppose that the process can be repeated with loop since every next set of coordinates begins after 2 lines from the previous one. How can this be done? Thanks in advance.

Azzi Abdelmalek on 31 Jul 2016
str=[]
fid=fopen('portion.txt')
l=fgetl(fid)
while ischar(l)
str{end+1,1}=l;
l=fgetl(fid);
end
fclose(fid)
str
idx=str(cellfun(@numel,regexp(str,'[\d\.]+'))==2)

Paschalis Garouniatis on 1 Aug 2016
Thank you for your answer. Maybe my question was confusing. As I commented below on dpb's answer, my aim is to get the number of the lines containing the sets of coordinates. The script should identify where the sets begin and end and then return something like begin=3 and end=138 for the first set. Then it should do this for the whole file.
Azzi Abdelmalek on 1 Aug 2016
str=[]
fid=fopen('portion.txt')
l=fgetl(fid)
while ischar(l)
str{end+1,1}=l;
l=fgetl(fid);
end
fclose(fid)
clc
str
f=regexpi(str,'[e\-\+\d\.]+')
idx=cellfun(@numel,f)
id=idx==2
ii1=strfind([0 id'],[0 1]) % Begin
ii2=strfind([id' 0],[1 0]) % End
Paschalis Garouniatis on 1 Aug 2016
Thanks a lot Azzi for your response. It worked just fine.

dpb on 31 Jul 2016
Edited: dpb on 1 Aug 2016
fid=fopen('portion.txt','r');
i=0; % loop counter
n=[];
while ~feof(fid) % until we run out of data
i=i+1; % increment counter
d(i)=textscan(fid,'%f %f',n(i),'collectoutput',1); % read the section
fgetl(fid); % straighten out file pointer end of record
end
fid=fclose(fid); % done with file
You'll have a list of the sizes and a cell array of M sets of nx2 coordinates to do with as wish...
Running on the file here I get...having named the m-file portion.m
>> portion
>> n
n =
136 162
>> d
d =
[136x2 double] [162x2 double]
>> cumsum([[3 2+n(1:end-1)].' [2+n].']) % the start/stop positions from the lengths
ans =
3 138
141 302
>>

Paschalis Garouniatis on 2 Aug 2016
Thanks a lot for your response dpb. My original data file has a few more lines as headers above the first line of the .txt file that I attached. I thought it wouldn't matter for the solution but it did. So I had to make some adjustments and make use of dlmcell function in order to ignore those lines and run your code as is. Below I include the adjustment code. Again thanks for the help. Your code also worked.
fid1=fopen('mydata.txt','r'); % open data file
d=textscan(fid1,'%s','headerlines',7,'delimiter','') % ignore first 7 lines
fclose(fid1)
dlmcell('portion.txt',d{1,1}); % write data to .txt file using function dlmcell
dpb on 2 Aug 2016
No need to create a new file, simply skip the odd headerlines before getting to the portion of the file that is regular and go from there--
fid=fopen('portion.txt','r');
for i=1:7, fgetl(fid); end % skip preliminary stuff
...
From this point everything's the same excepting for the real file you'll need to add 7 to all the line numbers obtained if you're going to use them with respect to that file.
Paschalis Garouniatis on 3 Aug 2016
Thank you very much for your help dpb.

Shameer Parmar on 1 Aug 2016
Edited: Shameer Parmar on 1 Aug 2016
Data = textread('portion.txt', '%s', 'delimiter', '');
LineIndex = {};
count = 1;
for i=1:length(Data)
if ~isempty(strfind(Data{i},' '))
temp_line = regexp(Data{i},' ','split');
LineIndex{count,1} = ['Begin at ',num2str(i+1)];
LineIndex{count+1,1} = ['End at ',num2str(i + str2num(temp_line{1}))];
count=count+2;
end
end
Make sure that your file "portion.txt" is in current directory.
to check output just type "LineIndex"
Output:
LineIndex =
'Begin at 3'
'End at 138'
'Begin at 141'
'End at 302'