Reading a set of numeric values from 100s of .txt files inside a folder

3 views (last 30 days)
I have a folder named SimResults. Inside the folder I have 100s of .txt files. Let the name of file i is of the format "val1_x(i)_val2_y(i)_val3_z(i).txt" . The variables x, y and z varies across different file names. Inside the file i, I have the below text somewhere:
Frame 98 Finished!
Layer 1: DL n_bits = 823200. DL BER = 1.09e-05
Frame 99 Finished!
Layer 1: DL n_bits = 831600. DL BER = 1.08e-05
Frame 100 Finished!
Layer 1: DL n_bits = 840000. DL BER = 1.07e-05
I want to extract data from the line after "Frame 100 Finished! " in every txt file. So in effect, for this text file i, I should obtain a set of values as below
val1(i) = x(i)
val2(i) =y(i)
val3(i) =z(i)
DL_n_bits(i) =840000
DL BER(i)=1.07e-05
Can someone help me sequentially do this for all the txt files and save that data?

Accepted Answer

Walter Roberson
Walter Roberson on 3 Aug 2022
foldername = 'SimResults';
dinfo = dir( fullfile(foldername, '*.txt'));
filenames = {dinfo.name};
nfiles = length(filenames);
val1 = zeros(nfiles,1);
val2 = zeros(nfiles,1);
val3 = zeros(nfiles,1);
DL_n_bits = zeros(nfiles,1);
DL_BER = zeros(nfiles,1);
for K = 1 : nfiles
thisfilename = filenames{K};
parts = regexp(thisfilename, '_', 'split');
x = str2double(parts{2})
y = str2double(parts{4});
z = str2double(parts{6});
S = fileread( fullfile(foldername, thisfilename) );
info = regexp(S, 'Frame 100 Finished!.*?DL n_bits = (?<bits>\d+.*BER = (?<BER>\S+)', 'once', 'names');
bits = str2double(info.bits);
BER = str2double(info.BER);
val1(K) = x;
val2(K) = y;
val3(K) = z;
DL_n_bits(K) = bits;
DL_BER(K) = BER;
end
  4 Comments
Walter Roberson
Walter Roberson on 3 Aug 2022
This code does presume that the bits is integer and the period after is for human reading

Sign in to comment.

More Answers (1)

dpb
dpb on 3 Aug 2022
Edited: dpb on 3 Aug 2022
Alternatively, just as an experiment, wonder how it would work using some of the more recently introduced features --
foldername = 'SimResults';
d=dir( fullfile(foldername, '*.txt'));
filenames = {dinfo.name};
nfiles = length(filenames);
% here, since we've got the full list of filenames, I'd be tempted to go
% ahead and scan it now for the vals array --
% with the new-fangled string functions (are they as quick as a regexp expression?)
pat="_"+digitsPattern; % to isolate the x,y,z
vals=str2double(extractAfter(extract(filenames,pat),'_')); % and convert those to numeric
% alternatively, with the old standby -- although it hasn't been internally vecorized
fmt1='val1_%d_val2_%d_val3_%d.txt';
vals=double(cell2mat(cellfun(@(s) cell2mat(textscan(s,fmt)),filenames,'UniformOutput',0)));
% Can try the above on real dataset; with toy set of 10 or so sample
% filenames here, there was no discernible timing difference.
% allocate for the others that have to read files for...
DL_n_bits = zeros(nfiles,1);
DL_BER = zeros(nfiles,1);
fmt2='Layer 1: DL n_bits = %f DL BER = %f';
for K = 1:nfiles
S=readlines(fullfile(foldername,filenames{K}));
ix=find(startsWith(S,'Frame 100 Finished!'))+1;
vals=cell2mat(textscan(S(ix),fmt));
DL_n_bits(K) = vals(1);
DL_BER(K) = vals(2);
end
I wonder if it's any quicker to find the particular line and parse it over regexp searching the whole file itself to find the same point in the really long chararacter string -- or how much more overhead the string array introuduces instead???

Categories

Find more on Bioinformatics Toolbox in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!