I want to read a text file having strings and numeric data. Is there any better function than textscan?

Dear Users,
below are the few lines of the text which I want to display individually. It is the data of a TLE (Two-Line elements)
I want to separately read each data from the TLE. How can I do that? I want to skip the first 5 lines and start reading the data from 6th row till the end. I will be thankful if anyone can help.
=====================================================================
24652 1996-063A ARABSAT-2B
Launched: 1996-11-13 (318) Start Date: 1996-06-12 (164)
Decayed: Stop Date: 2003-12-20 (354)
=====================================================================
1 24652U 96063A 96318.74847837 -.00000020 00000-0 00000+0 0 14
2 24652 3.9929 210.6007 7281127 177.7757 190.4436 2.27277888 06
1 24652U 96063A 96319.62211352 -.00000020 00000-0 00000+0 0 31
2 24652 3.9929 210.3183 7284735 178.4392 185.2995 2.27373269 12
1 24652U 96063A 96319.62351606 .00008082 00000-0 30835-2 0 24
2 24652 3.9764 210.1654 7280836 178.5436 186.6267 2.27380102 20
1 24652U 96063A 96319.62356237 .00009638 00000-0 38025-2 0 37
2 24652 3.9632 210.3512 7280110 178.4006 186.6625 2.27374993 25
1 24652U 96063A 96320.05952563 -.00002597 00000-0 -98092-3 0 63
2 24652 3.9623 210.1661 7275699 178.7092 185.6294 2.27896863 39
end

3 Comments

Better than textscan. I really doubt that and I think you meant "easier".
Hi Oleg,
if your data is saved as a .txt format, then I would use the following code:
Data=dlmread('/Users/oleg/Desktop/....txt'); in order to read the .txt from line 5, Data =Data(5:end,:);
@Srikanta: dlmread() is a wrapper of textscan. Also it will not allow me to directly import the lines identified with 2 (first col) as doubles but I will have to convert.
I usually prefer to have more control on the importing procedure and try to avoid datatype conversions.

Sign in to comment.

 Accepted Answer

Using textscan you can import one line and skip the next one (I save your example to test.txt).
% Import lines which start with 1
fid = fopen('test.txt');
line1 = textscan(fid, '%f%s%s%f%f%s%s%f%f\r\n %*[^\n]','HeaderLines',5);
fclose(fid);
% Import lines which start with 2
fid = fopen('test.txt');
line2 = textscan(fid, '%f%f%f%f%f%f%f%f%f\r\n %*[^\n]','HeaderLines',6);
fclose(fid);
The format specifier, e.g. for line 2, is '%f%f%f%f%f%f%f%f%f\r\n %*[^\n]'. Note that I read in all the values of the line, then I proceed to next line with \r\n, and I skip its content with %*[^\n]. This way I read every each line.

6 Comments

Thanks! I tested it and it works good. Though when I tried to display line1{4} the result wasn't the expected. Could you check that please? Also how can I select a single row and column? for example if I want to select 1st row and 6th column of line1.
line1{3} will display the values of the 3rd column only.
The most straightforward way to represent in a matrix fashion numeric and char data is to store each value/string in a separate cell (inefficient from a RAM point of view).
An example
A = {23 'hi' '33' ;
1 'hello' 'bye'}
then A(2,3) display the value on a second row and third column.
A(2,:) display the whole second row.
With results from textscan you have to prepare it in the following way:
[num2cell(line1{:,1}) line1{:,2}]
Dear Oleg Sorry i wanted to write line1{4}. It is displaying the 4th column improperly.
You said it right it is displaying the column, i.e. you have to change the way the command window displays things (even if they are stored properly in the variable).
So, try
format long g
line1{4}
Dear Oleg [num2cell(line1{:,1}) line1{:,2}] doesn't work for columns 4 and onwards.
I only gave an example for the first two columns, but you have to code all the remaining within [...].
If the column is double, then num2cell... Otherwise simply line1{:,n}.
if tou have problems extending my example post here the syntax which is not working.

Sign in to comment.

More Answers (5)

This could work:
fid = fopen('bla.txt','r');
%Advance five lines:
linesToSkip = 5;
for ii = 1:linesToSkip-1
fgetl(fid);
end
%Process all remaining lines
tline = fgetl(fid);
your_data = []; %You should allocate if you know how large your data is
while (~isempty(tline) )
tline = fgetl(fid);
%Getting rid of non-numbers
tline = regexprep(tline,'[^0-9\s+-.eE]','');
your_data = [your_data; str2num(tline)];
end
fclose(fid);
Note that 30835-2 is interpreted as 30833. If you want them separated then you should modify the regular expression, but then you cannot not build your_results in the same manner (some rows would have nine values, the others more). What i tried to do is to get a numeric array from your data, but you can get a cell array as well.
Maybe that is more what you want, for each tline you can get a cell array:
Hello Hamza
The first column is a dummy variable (1,2) and as Oleg says the column count in 1 and 2 are different. Next, I define the steps to get the data. Solution done in Matlab R2012a for Mac.
Step 1 Load or import file
Data=importdata('test.rtf'); %Is how text edit save data in Mac
When you do this, the data that you want to get start at row 13.
Step 2 Get the data
Case 1 Dummy value is 1
[Col1] = textscan(Data{13,1}, '%d %s %s %f %f %f %d %d %d %d %d');
For the first row data you get:
Col1=[1 '24652U' '96063A' 96318,7484783700 -2,00000000000000e-07 0 0 0 0 0 14]
Case 2 Dummy value is 2
[Col2] = textscan(Data{14,1}, '%d %s %s %f %f %f %f %f %d');
Col2 =[2 '24652' '3.9929' 210,60070 7281127 177,77570 190,44360 2,27277880 6]
If this or previous answers solve your question, please grade.
Best regards
Javier

3 Comments

On Col1 you lose information on the sign, e.g. 3rd line starting with 1 value '30835-2'.
Hi Oleg.
As I understand, there is no number 30835-2. What you could have is 30835e-2 or just two numbers 30835 and -2 (this is what I program in Matlab 2012a and it works just fine). If you work with the expression '30835-2', Matlab will interpret as a minus operation and the result will be 30833.

Sign in to comment.

What format do you want the data in? Using importdata gives you all the data as strings- you can convert the numeric data and split up the data into rows 1 and 2 easily.
%import data from txt file (using space as delimiter)
L=importdata('TLEtest.txt',' ');
%get data, ignoring first 5 lines
Data=L.textdata(6:end,:);

7 Comments

Undoubtedly the syntactically clearest and simplest approach (although not the most efficient).
Indeed this is very effective and simplest way from all the answers. But why column 7 is displaying results from both line1 and line2 of the TLEtest file?
Because he imports everything and then it's up to you to separate it or keep together.
The problem I am facing here is that it is delimiting till 6th column only and the 7th column is all last remaining 3 columns of both line 1 and 2. Any idea how to be able to delimit all 9 columns separately?
I guess the answer to your question is: no. Importdata is simple. It is not flexible. It seems to me you have two options:
  1. If you generate the data yourself, modify your algorithm so it produces something that is more amenable to importdata.
  2. Use some of the other suggestions presented here.
When you have that mix of characters and numbers, there is no one-liner that will provide a foolproof solution, you will always need to massage it. To make matters worse, it is not regular data, as it changes from line to line.
But when i'm using uiimport function to import the text file it imports correctly. The data is all correct. But when I try to run the code generated by uiimport it gives me the following error. Error using importfile (line 9) Not enough input arguments.
Code is as below: function importfile1(fileToRead1) %IMPORTFILE1(FILETOREAD1) % Imports data from the specified file % FILETOREAD1: file to read
% Auto-generated by MATLAB on 09-Sep-2012 15:36:41
% Import the file
newData1 = importdata(fileToRead1);
% Create new variables in the base workspace from those fields.
vars = fieldnames(newData1);
for i = 1:length(vars)
assignin('base', vars{i}, newData1.(vars{i}));
end
You have to pass the 'filename' of the file you like to import, i.e. uiimport creates a function.

Sign in to comment.

This is quite scrappy but it gives you the data quite clearly in a cell array:
n=5; %lines to skip
A=fileread('TLETest.txt');
L=[1 regexp(A,'\n') length(A)+1]';
nLines=length(L)-n-1;
T=cell(nLines,1);
%break string of characters into rows and columns
Data=cell(nLines,9);
for ii=1:nLines
T=A(L(n+ii):L(n+ii+1)-1);
B=regexp(T,'\s*','Split');
B(cellfun('isempty',B))=[];
Data(ii,:)=B;
end

19 Comments

Thanks for the reply Tom. I ran the code as it is and it is giving me this error:
Subscripted assignment dimension mismatch.
Error in test5 (line 16) Data(ii,:)=B;
I don't get that error, it might be the way the text file is formatted? Have you added anything to the code? That line is only line 12 for me.
You could scratch all that nonsense and try this instead:
fid=fopen('TLEtest.txt');
A=textscan(fid,'%s','HeaderLines',5);
Data=reshape(A{1},9,[])';
fclose(fid);
Hey thanks again, well i didn't add anything probably just spaces. I ran this code as it is now and got this error. Error using reshape Product of known dimensions, 9, not divisible into total number of elements, 17553.
Error in test6 (line 3) Data=reshape(A{1},9,[])';
To test it I used the data you posted originally, copied and pasted into a text file- is this similar to what you are using?
no actually the data i've posted is only first few lines. The data is about 1900 lines.
Is there anything written at the bottom of the file after all the data?
yea (<End of file>) without paranthesis
If that's there is, then this line should work instead:
Data=reshape(A{1}(1:end-3,:),9,[])';
works great now. thanks Tom very much appreciated.
Dear Tom Is it possible to read and store the data of Data in line 1 upto end of file and line 2 upto end of file using reshape?
Do you mean like this?
Line1=Data(1:2:end,:);
Line2=Data(2:2:end,:);
yes exactly!thanks a lot. im not much of a programmer
Dear Tom Since now I have a structured array. how can I break down each column? for example: 1st row 3rd column is 96063A. I want to read and store each integer and string separately.
How do you want to store it? Each as a separate variable? e.g.
a=9;
b=6;
...
f='A';
I want to be able to access them one-by-one. for example in 96063A, 96 is the year,06 is the month of launch and so on. and store them as year=96, month=06
though this value remains the same throughout the file but for other values they are changing like 1st row 4th column. I think I would need a loop to read that value in every alternate line.
There are probably many ways you can do it, I'd ask this as a seperate question so more people can see it.

Sign in to comment.

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!