Read file with non-uniform lines?

7 views (last 30 days)
bene1
bene1 on 25 Oct 2020
Commented: bene1 on 27 Oct 2020
Hi. I'm a Matlab newbie. I would like to read in a file where the lines have different formats, as below.
% Coordinates
% Code ID X Y
C 101 0.001 0.001
C 102 1.002 0.002
C 103 1.003 1.003
C 104 0.004 1.004
% Distances
% Code ID From To Dist
D 201 101 103 1.417
D 202 102 104 1.414
If the first character is C, use...
A = textscan(fid,'%c %d %f %f')
If the first character is D, use...
A = textscan(fid,'%c %d %d %d %f')
After, I'd like to assign the data to structs (c.id, c.x, c.y, d.id, d.from, d.to, d.dist), but first I think I just need to get it scanned in. Is it possible to apply some logic to reading the file? Thank you.
  5 Comments
Walter Roberson
Walter Roberson on 26 Oct 2020
'^\s*C.*$', 'dotexceptnewline', 'lineachors'
or
'(?<=(^|\n))\s*C[^\n]*'
with no additional options needed
bene1
bene1 on 26 Oct 2020
Great, thanks again. Now have...
C =
4×1 cell array
{' C 101 0.001 0.001←'}
{' C 102 1.002 0.002←'}
{' C 103 1.003 1.003←'}
{' C 104 0.004 1.004←'}
With C as a 4x1, I believe my next step is to extract out the columns. My first thought was
A = textscan(C,'%c %d %f %f')
but I see I can't do that. Looking into cell2struct?

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 26 Oct 2020
Named tokens, I said. Do not extract the lines ahead of time.
FileText = fileread(YourFileName);
Ctokens = regexp(FileText, '^\s*C\s+(?<ID>\d+)\s+(?<X>\S+)\s+(?<Y>\S+)', 'names', 'lineanchors');
%Ctokens will now be a struct array with field names ID, X, and Y, each of which are character vectors.
C.ID = str2double({Ctokens.ID});
C.X = str2double({Ctokens.X});
C.Y = str2double({Ctokens.Y});
Dtokens = regexp(FileText, '^\s*D\s+(?<ID>\d+)\s+(?<From>\d+)\s+(?<To>\d+)\s+(?<Dist>\S+)', 'names', 'lineanchors');
%Dtokens will now be a struct array with field names ID, From, To, Dist, each of which are character vectors.
D.ID = str2double({Dtokens.ID});
D.From = str2double({Dtokens.From});
D.To = str2double({Dtokens.To});
D.Dist = str2double({Dtokens.Dist});
Amount of processing work is pretty minimial. Pretty much all of the effort is in figuring out the proper regexp patterns to use (which can be pretty tricky when there are variant lines.)

More Answers (1)

per isakson
per isakson on 26 Oct 2020
>> S = cssm( 'd:\m\cssm\cssm.txt' )
S =
1×2 struct array with fields:
header
colhead
Code
data
>> S(1)
ans =
struct with fields:
header: "Coordinates"
colhead: ["Code" "ID" "X" "Y"]
Code: [4×1 string]
data: [4×3 double]
>> S(2)
ans =
struct with fields:
header: "Distances"
colhead: ["Code" "ID" "From" "To" "Dist"]
Code: [2×1 string]
data: [2×4 double]
where
function sas = cssm( ffs )
chr = fileread( ffs );
str = string( chr );
str = replace( str, char([13,10]), newline ); % get rid of the carriage return
% split the string into blocks. Use the block header as delimiter.
[blk,del] = strsplit( str, '(?m)^\x20*%\x20\w+\x20*\n' ...
, 'DelimiterType','RegularExpression' );
blk(1) = []; % remove empty block before the first delimiter
len = numel( del );
sas(1,len) = struct( 'header',"", 'colhead',"", 'Code',"", 'data',nan );
for jj = 1 : len % loop over all blocks
sas(jj).header = regexp( del(jj), '\w+', 'match','once' ); % match the name
cac = textscan( blk(jj), "%[^\n]", 1 ); % read the first row
tmp = strsplit( string(cac{1}) ); % split the row into column headers
tmp(1) = []; % remove the comment character, "%"
sas(jj).colhead = tmp;
cac = textscan( blk(jj), ['%s',repmat('%f',1,numel(tmp)-1)] ...
, 'Headerlines',1, 'CollectOutput',true );
sas(jj).Code = string(cac{1});
sas(jj).data = cac{2};
end
end
and where cssm.txt contains the data given in of your question.

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!