Reading numbers from general text-file

210 views (last 30 days)
mrBrown on 17 Jun 2011
Hi all,
All morning I've been trying to figure out how to read a textfile like
This is the textfile I'd like to read.
the only interesting part are the numbers below
1 2
3 4
5 6
Would be nice to have a generic way to do this (eg read only
the lines that contain only two numbers)
I've tried using fscanf(fid,'%e %e\n') and regexp(..). But failed to get it working. Since I'm trying to read large (600mb+) datafile I don't want to fall back to processing the file line-by-line. Pre-processing the file with python or some other language is also not the preferred solution.
mrBrown on 22 Jun 2011
at this moment they are floating point numbers. But I'm really not looking for a specific solution for the file above. It's more an academic question of how to tackle problems like this in general.

Sign in to comment.

Answers (5)

Jan on 18 Jun 2011
You want to identify the lines by their contents, therefore a line-by-line processing is necessary.
But your file has 600 MB?! Then it contains up to 150.000 numbers? Then a pre-allocation is necessary for a reasonable speed. Can you define an upper limit for the number of values? I include at least a partial pre-allocation:
out = [];
len = 10000;
part = zeros(2, len);
ipart = 0;
fid = fopen(FileName, 'r');
if fid < 0, error('Cannot open file'); end
while 1 % Infinite loop
s = fgets(fid);
if ischar(s)
data = sscanf(s, '%g %g', 2);
if length(data) == 2
ipart = ipart + 1;
part(:, ipart) = data;
if ipart == len
out = cat(2, out, part);
ipart = 0;
else % End of file:
out = cat(2, out, part(:, 1:ipart));

Ivan van der Kroon
Ivan van der Kroon on 17 Jun 2011
This is not a very nice solution, but it worked for me
C = textscan(fid, '%s');
for j=1:length(C)
if length(C{j})==1
  1 Comment
mrBrown on 17 Jun 2011
It works indeed for the simple test-file that I presented.
In reality however there are also lines with more that 2 numbers (which should not be read) and numbers longer than 1

Sign in to comment.

mrBrown on 22 Jun 2011
many thanks for all the replies. Seems that there are many ways to solve this problem line by line, but getting the work done with a single quick command seems to be impossible.
Finally please find below yet another line-by-line solution.
Jan: smart way of allocating memory! (I took the lazy route).
wantedlength = 5;
filename = '600mbTextfile.txt';
%%This method takes 30 seconds
lines = textread(filename,'%s','delimiter','\r');
ind = length(lines);
%%processing takes about 200 seconds
% process data
result = zeros(ind,wantedlength); % pre-alloc
counter = 1;
% next = 1;
for iline = 1:length(lines)
line = lines{iline};
data = str2num(line);
if isnumeric(data)
% this is the daya you want, right?
if (length(data)==wantedlength)
result(counter,:) = data;
counter = counter +1;
result = result(1:counter-1,:);

Yella on 22 Jun 2011
If it is a txt file... u can "load" matlab function
load file.txt b=file;
where b is a matrix(Matlab has limitation on size of matrix)
clear all;
%b=textread('ravi.txt', '%s', 'whitespace', '')
load ravi.txt
c=input('Enter the value of start node:');
d=input('Enter the value of end node:');
e=input('Enter the value of column: colum 3:SY column2:SX column 1: Node column 5: SXY :::');
if (c>d)
display('Re run the program choosing c<d')
for i=1:1:n
if ((b(i,1)>=c) && (b(i,1)<=d))
result= [result b(i,e)]
ravi is a text file with having 300 samples(all are floating point numbers) collected from ANSYS
This worked for me, might be helping u
  1 Comment
Yella on 22 Jun 2011
here is the link to the program

Sign in to comment.

Walter Roberson
Walter Roberson on 22 Jun 2011
You ruled out the short quick versions when you said that preprocessing with python or other languages was not the preferred solution.
This is the sort of thing that could be done relatively easily with a call to perl. perl can be called directly from MATLAB -- it is supplied with MATLAB and there is a specific perl() MATLAB command.
while (<>) {/^\s*-?\d+\.?\d*\s+-?\d+\.?\d*\s*$/p}
nums = textscan(perl('twonums.perl',InputFileName),'%f%f','CollectOutput',1);
result = nums{1};
The perl expression I give is not perfect, but it is serviceable. For example it does not allow for the possibility that the number does not have a leading digit before the decimal point. Getting all the details right for exponential format can be difficult, with little details like that making quite a difference in how easy it is to write the regular expression.
You could also use regular expressions inside MATLAB; this will be slower than calling out to perl, but might allow you to skip some of those str2num() as str2num() is fairly slow.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!