Clear Filters
Clear Filters

textscan of mixed data type data file

8 views (last 30 days)
I'm trying to import the data from a column-based text file into MATLAB matrix, in which each column of the matrix includes the headerline and its corresponding data column. The file consist of headerline followed by columns of data, as you may see in the attachemnt. I need MATLAB to read the first line (i.e. the headerline: string/char data type) and detect how many headers are there, which corresponds to the number of variables in the file, then read the following data (double data type) in columns based.
  2 Comments
Walter Roberson
Walter Roberson on 17 Feb 2022
textscan(fid, '', 'HeaderLines', 1)
would tell textscan() to skip one line and then figure out by itself how many columns there are.
Do you need the variable names to be remembered, or were you just looking to figure out how many columns were there?
Ashraf Alfandi
Ashraf Alfandi on 17 Feb 2022
I need both. Headerlines will help me retreive the data column based on it's title/headerline.

Sign in to comment.

Accepted Answer

Ashraf Alfandi
Ashraf Alfandi on 17 Feb 2022
Edited: Ashraf Alfandi on 17 Feb 2022
Thanks Mathieu for your ansewr. It's definitly very copmrehensive, but time consuming for what I need to run. After all, I came up with the follwoing simple code that takes ~ 0.0009 sec wheras yours takes 0.5 seconds
FileName = "DNP MM SYS test.dat";
test = importdata(FileName);
Data = test.data; % Extracting the data via importdata
N = length(Data(1,:)); % Detecting the number of columns
fid = fopen(FileName);
Head = textscan(fid,'%q', N+1,'HeaderLines',1); % use the N to tell textscan how many strings to expect
Head = [Head{:}]'; Head = Head(2:end);
fclose(fid);
  1 Comment
Mathieu NOE
Mathieu NOE on 17 Feb 2022
hello
no problem
I got this for my code : Elapsed time is 0.027212 seconds.
for your code : Elapsed time is 0.047455 seconds.

Sign in to comment.

More Answers (2)

Mathieu NOE
Mathieu NOE on 17 Feb 2022
hello
try this
readclm is a old but still valuable function (don't even remember where it came from)
the variable names are stored in cell array var
[DATA,HEAD] = readclm('DNP SYS test.txt');
var = split(HEAD,' "');
var = var(2:end);
var = strrep(var,'"',''); %get rid of double quotes
function [outdata,head] = readclm(filename,nclm,skip,formt)
% READCLM Reads numerical data from a text file into a matrix.
% Text file can begin with a header or comment block.
% [DATA,HEAD] = READCLM(FILENAME,NCLM,SKIP,FORMAT)
% Opens file FILENAME, skips first several lines specified
% by SKIP number or beginning with comment '%'.
% Then reads next several lines into a string matrix HEAD
% until the first line with numerical data is encountered
% (that is until first non-empty output of SSCANF).
% Then reads the rest of the file into a numerical matrix
% DATA in a format FORMAT with number of columns equal
% to number of columns of the text file or specified by
% number NCLM. If data does not match the size of the
% matrix DATA, it is padded with NaN at the end.
%
% READCLM(FILENAME) reads data from a text file FILENAME,
% skipping only commented lines. It determines number of
% columns by the length of the first data line and uses
% the floating point format '%g';
%
% READCLM uses FGETS to read the first lines and FSCANF
% for reading data.
% Defaults and parameters ..............................
formt_dflt = '%g'; % Default format for fscanf
addn = nan; % Number to fill the end if necessary
% Handle input ..........................................
if nargin<1, error(' File name is undefined'); end
if nargin<4, formt = formt_dflt; end
if nargin<3, skip = 0; end
if nargin<2, nclm = 0; end
if isempty(nclm), nclm = 0; end
if isempty(skip), skip = 0; end
% Open file ............................
[fid,msg] = fopen(filename);
if fid<0, disp(msg), return, end
% Find header and first data line ......................
is_head = 1;
jl = 0;
head = ' ';
while is_head % Add lines to header.....
s = fgets(fid); % Get next line
jl = jl+1;
is_skip = jl<=skip;
is_skip = jl<=skip | s(1)=='%';
out1 = sscanf(s,formt); % Try to read this line
% If unreadable by SSCANF or skip, add to header
is_head = isempty(out1) | is_skip;
if is_head & ~is_skip
head = str2mat(head,s(1:length(s)-1)); end
end
head = head(2:size(head,1),:);
% Determine number of columns if not specified
out1 = out1(:)';
l1 = length(out1);
if ~nclm, nclm = l1; end
% Read the rest of the file ..............................
if l1~=nclm % First line format is different from ncolumns
outdata = fscanf(fid,formt);
lout = length(outdata)+l1;
ncu = ceil(lout/nclm);
lz = nclm*ncu-lout;
outdata = [out1'; outdata(:); ones(lz,1)*addn];
outdata = reshape(outdata,nclm,ncu)';
else % Regular case
outdata = fscanf(fid,formt,[nclm inf]);
outdata = [out1; outdata']; % Add the first line
end
fclose (fid); % Close file ..........
end

Wesser
Wesser on 27 Oct 2022
So I originally had the script as below. It works perfectly when all the Obs_Node.out files have the same number of rows. But when the Obs_node.out files have a different number of rows, I can't compile the columns from each forloop. For example,
THETA_ObsNode(:,i) = theta_ObsNode(:);
will result in an error like:
"Unable to perform assignment because the size of the left side is 200000-by-1 and the size of the right side is
117648-by-1.
Error in MC_Data_Compile (line 67)
THETA_ObsNode(:,i) = theta_ObsNode(:); "
I am ultimatly trying to compile each column from each forloop into one file for that respective column....if that makes sense. My qestion then is how do I compile the data when the lengths of the column vary?
num_sim = 1000; %1000 monte carlo simulations
Node_CONC=zeros(200000,num_sim); %200000 is an arbitrarilly large number of rows
%~~~~~~~~~~Coalesce data from Obs_Node.out files~~~~~~~~~~~~
for i=1:num_sim
Obs_Node = fopen(["/Users/apple/Dropbox/My Mac (apple’s MacBook Pro)/Desktop/Simulations/MC_"+num2str(i)+'/Obs_Node.out']); % Open monte carlo output file in Path (i)
skip_lines=11; %skip all the lines until the output data of interest
for k=1:(skip_lines)
x=fgetl(Obs_Node);
end
temp1 = fscanf(Obs_Node,'%f',[5,Inf]); %scan the matrix of data
TEMP1 = temp1'; % transpose data
theta_ObsNode = TEMP1(:,3); % Hydraulic Conductivity
THETA_ObsNode(:,i) = theta_ObsNode(:); %%%% this line saves each iteration's data in a seperate file
flux_ObsNode = TEMP1(:,4); % Water Flux
FLUX_ObsNode(:,i) = flux_ObsNode(:);
Conc_ObsNode = TEMP1(:,5); % Concentration g/cm3
CONC_ObsNode(:,i) = Conc_ObsNode(:);
fclose(Obs_Node);
end
  1 Comment
Walter Roberson
Walter Roberson on 27 Oct 2022
Pad the arrays for the shorter data.
Here I use NaN to pad, as it is clear that NaN is not valid data. The code could be a bit shorter if it was acceptable to pad with zeros instead of some other value.
The below code does not assume that all files except the last are the same length: it dynamically grows the array any time it encounters a larger file, making sure to extend the padding for any existing data.
num_sim = 1000; %1000 monte carlo simulations
Node_CONC=zeros(200000,num_sim); %200000 is an arbitrarilly large number of rows
%~~~~~~~~~~Coalesce data from Obs_Node.out files~~~~~~~~~~~~
for i=1:num_sim
Obs_Node = fopen(["/Users/apple/Dropbox/My Mac (apple’s MacBook Pro)/Desktop/Simulations/MC_"+num2str(i)+'/Obs_Node.out']); % Open monte carlo output file in Path (i)
skip_lines=11; %skip all the lines until the output data of interest
for k=1:(skip_lines)
x=fgetl(Obs_Node);
end
temp1 = fscanf(Obs_Node,'%f',[5,Inf]); %scan the matrix of data
TEMP1 = temp1'; % transpose data
theta_ObsNode = TEMP1(:,3); % Hydraulic Conductivity
flux_ObsNode = TEMP1(:,4); % Water Flux
Conc_ObsNode = TEMP1(:,5); % Concentration g/cm3
num_obs_here = length(theta_ObsNode);
if i == 1
THETA_ObsNode = nan(num_obs_here,num_sim);
FLUX_ObsNode = THETA_ObsNode;
CONC_ObsNode = THETA_ObsNode;
THETA_ObsNode(:,i) = theta_ObsNode;
FLUX_ObsNode(:,i) = flux_ObsNode;
CONC_ObsNode(:,i) = conc_ObsNode;
elseif num_obs_here <= size(Theta_ObsNode,1)
THETA_ObsNode(1:num_obs_here,i) = theta_ObsNode;
FLUX_ObsNode(1:num_obs_here,i) = flux_ObsNode;
CONC_ObsNode(1:num_obs_here,i) = conc_ObsNode;
else
THETA_ObsNode(end+1:num_obs_here,:) = NaN;
FLUX_ObsNode(end+1:num_obs_here,:) = NaN;
CONC_ObsNode(end+1:num_obs_here,:) = NaN;
THETA_ObsNode(:,i) = theta_ObsNode;
FLUX_ObsNode(:,i) = flux_ObsNode;
CONC_ObsNode(:,i) = conc_ObsNode;
end
fclose(Obs_Node);
end

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!