How to identify and read similar files names in a folder and then create a matrix array with the data

3 views (last 30 days)
Hello. I need to process hundreds of data copy in a folder from which the total data is separated into 3 with similar names as shown in the attached image (from here 5 output files will be identified). So, I will need to create a matrix array joining the data automatically in different files in a different folder .

Accepted Answer

Mathieu NOE
Mathieu NOE on 25 Oct 2022
so an updated version of the code
to load data and headers of your files I used an older but still usefull readclm function (see end of file)
for the time being I am doing anything with the headers , it's up to you to decide if it's needed or not
hope it helps
clc
clearvars
path = pwd; % me
S = dir(fullfile(path,'2021*.txt'));
%% export
HeaderLines = 6; % header lines of data files
out_hne = [];
out_hnn = [];
out_hnz = [];
out_hne_filename = []; %
out_hnn_filename = []; %
out_hnz_filename = []; %
for k = 1:numel(S)
current_folder = S(k).folder;
current_filename = S(k).name;
F = fullfile(current_folder,current_filename);
% case 1 : "HNE"
if contains(current_filename,'HNE')
%data_hne = readmatrix(F,"NumHeaderLines",HeaderLines,"ExpectedNumVariables",1);
[data_hne,header_hne] = readclm(F,1);
out_hne = [out_hne;data_hne]; % data vertical concatenation
out_hne_filename = [out_hne_filename;current_filename]; % debug
% case 2 : "HNN"
elseif contains(current_filename,'HNN')
% data_hnn = readmatrix(F,"NumHeaderLines",HeaderLines,"ExpectedNumVariables",1);
[data_hnn,header_hnn] = readclm(F,1);
out_hnn = [out_hnn;data_hnn]; % data vertical concatenation
out_hnn_filename = [out_hnn_filename;current_filename]; % debug
% case 3 : "HNZ"
elseif contains(current_filename,'HNZ')
% data_hnz = readmatrix(F,"NumHeaderLines",HeaderLines,"ExpectedNumVariables",1);
[data_hnz,header_hnz] = readclm(F,1);
out_hnz = [out_hnz;data_hnz]; % data vertical concatenation
out_hnz_filename = [out_hnz_filename;current_filename]; % debug
end
end
%% exports to /out folder
out_folder = [current_folder '/out'];
writematrix(out_hne,fullfile(out_folder,'all_HNE.txt'));
disp(' HNE files processed according to this list :');
disp(out_hne_filename)
disp(' -------------------------------------------');
writematrix(out_hnn,fullfile(out_folder,'all_HNN.txt'));
disp(' HNN files processed according to this list :');
disp(out_hnn_filename)
disp(' -------------------------------------------');
writematrix(out_hnz,fullfile(out_folder,'all_HNZ.txt'));
disp(' HNZ files processed according to this list :');
disp(out_hnz_filename)
disp(' -------------------------------------------');
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [outdata,head] = readclm(filename,nclm,skip,formt)
% READCLM Reads numerical data from a text file into a matrix.
% Text file can begin with a header or comment block.
% [DATA,HEAD] = READCLM(FILENAME,NCLM,SKIP,FORMAT)
% Opens file FILENAME, skips first several lines specified
% by SKIP number or beginning with comment '%'.
% Then reads next several lines into a string matrix HEAD
% until the first line with numerical data is encountered
% (that is until first non-empty output of SSCANF).
% Then reads the rest of the file into a numerical matrix
% DATA in a format FORMAT with number of columns equal
% to number of columns of the text file or specified by
% number NCLM. If data does not match the size of the
% matrix DATA, it is padded with NaN at the end.
%
% READCLM(FILENAME) reads data from a text file FILENAME,
% skipping only commented lines. It determines number of
% columns by the length of the first data line and uses
% the floating point format '%g';
%
% READCLM uses FGETS to read the first lines and FSCANF
% for reading data.
% Defaults and parameters ..............................
formt_dflt = '%g'; % Default format for fscanf
addn = nan; % Number to fill the end if necessary
% Handle input ..........................................
if nargin<1, error(' File name is undefined'); end
if nargin<4, formt = formt_dflt; end
if nargin<3, skip = 0; end
if nargin<2, nclm = 0; end
if isempty(nclm), nclm = 0; end
if isempty(skip), skip = 0; end
% Open file ............................
[fid,msg] = fopen(filename);
if fid<0, disp(msg), return, end
% Find header and first data line ......................
is_head = 1;
jl = 0;
head = ' ';
while is_head % Add lines to header.....
s = fgets(fid); % Get next line
jl = jl+1;
is_skip = jl<=skip;
is_skip = jl<=skip | s(1)=='%';
out1 = sscanf(s,formt); % Try to read this line
% If unreadable by SSCANF or skip, add to header
is_head = isempty(out1) | is_skip;
if is_head & ~is_skip
head = str2mat(head,s(1:length(s)-1)); end
end
head = head(2:size(head,1),:);
% Determine number of columns if not specified
out1 = out1(:)';
l1 = length(out1);
if ~nclm, nclm = l1; end
% Read the rest of the file ..............................
if l1~=nclm % First line format is different from ncolumns
outdata = fscanf(fid,formt);
lout = length(outdata)+l1;
ncu = ceil(lout/nclm);
lz = nclm*ncu-lout;
outdata = [out1'; outdata(:); ones(lz,1)*addn];
outdata = reshape(outdata,nclm,ncu)';
else % Regular case
outdata = fscanf(fid,formt,[nclm inf]);
outdata = [out1; outdata']; % Add the first line
end
fclose (fid); % Close file ..........
end
  25 Comments
Mathieu NOE
Mathieu NOE on 2 Nov 2022
a better and more robust approach, if the position of the letters (L,T,V,...) are not sure (or fluctuating
is to extract the portion of the line that contains these letters : we know that the letters are somewher between the : and the ( characters so we can take advantage of that information
newStr = extractBetween(tline,":","("); % get the portion of the line that contains the searched letters L,T,V, etc...
then I would suggest to use the function contains which make the code more compact and neater
the entire reworked portion of code is then :
if len_tline>len_chan_line
if strcmp(tline(1:len_chan_line),channel_line)
newStr = extractBetween(tline,":","("); % get the portion of the line that contains the searched letters L,T,V, etc...
% if strcmp(tline(11),'E') || strcmp(tline(10),'T') % no
% if strcmp(tline(11),'E') || strcmp(tline(11),'T') % yes
if contains(newStr,'E') || contains(newStr,'T') % more robust
indexing(component) = 1;
% elseif strcmp(tline(11),'N') || strcmp(tline(10),'L') % no
% elseif strcmp(tline(11),'N') || strcmp(tline(11),'L') % yes
elseif contains(newStr,'N') || contains(newStr,'L') % more robust
indexing(component) = 2;
% elseif strcmp(tline(11),'V')
elseif contains(newStr,'V') % more robust
indexing(component) = 3;
end
end
end
the same comment applies to another portion of code where you can improve :
here's one section I modified but you can apply the modifications to all similar portions of the code that follow
if len_tline > len_stop_line
if strcmp(tline(1:len_stop_line),stopper_line)
line_tmp = i;
component = component + 1;
component_internal = 0;
% elseif length((strfind(tline,'OF VELOC D')))>0 % ok but not optimal
elseif contains(tline,'OF VELOC D') % simpler, neater
component_internal = component_internal + 1;
% elseif length((strfind(tline,'OF DISPL D')))>0 % ok but not optimal
elseif contains(tline,'OF DISPL D') % simpler, neater
component_internal = component_internal + 1;
end
end
the entire file is in attachment
again you have to remove some of my comments in the first lines to get that portion of your code working again (simply for me I don't use them).
all the best

Sign in to comment.

More Answers (2)

Mathieu NOE
Mathieu NOE on 25 Oct 2022
hello
I created first some dummy files according to the list (would have been a bit easier with a txt file rather than a picture...)
just fyi this is a code to create some numerical arrays (as you didn't provide the data files) I will then process afterwards
a = readlines('list.txt')
for ck = 1:numel(a)
filename = join([a(ck) '.txt'],'')
data = rand(10+5*ck,3);
writematrix(data,filename);
end
now this is the code for processing the data in the new /out folder (to avoid mixing input and output txt files)
path = pwd; % current (your) folder
S = dir(fullfile(path,'2021*.txt'));
%% export
out_hne = [];
out_hnn = [];
out_hnz = [];
out_hne_filename = []; %
out_hnn_filename = []; %
out_hnz_filename = []; %
for k = 1:numel(S)
current_folder = S(k).folder;
current_filename = S(k).name;
F = fullfile(current_folder,current_filename);
% case 1 : "HNE"
if contains(current_filename,'HNE')
data_hne = readmatrix(F);
out_hne = [out_hne;data_hne]; % data vertical concatenation
out_hne_filename = [out_hne_filename;current_filename]; % debug
% case 2 : "HNN"
elseif contains(current_filename,'HNN')
data_hnn = readmatrix(F);
out_hnn = [out_hnn;data_hnn]; % data vertical concatenation
out_hnn_filename = [out_hnn_filename;current_filename]; % debug
% case 3 : "HNZ"
elseif contains(current_filename,'HNZ')
data_hnz = readmatrix(F);
out_hnz = [out_hnz;data_hnz]; % data vertical concatenation
out_hnz_filename = [out_hnz_filename;current_filename]; % debug
end
end
%% exports to /out folder
out_folder = [current_folder '/out'];
writematrix(out_hne,fullfile(out_folder,'all_HNE.txt'));
disp(' HNE files processed according to this list :');
disp(out_hne_filename)
disp(' -------------------------------------------');
writematrix(out_hnn,fullfile(out_folder,'all_HNN.txt'));
disp(' HNN files processed according to this list :');
disp(out_hnn_filename)
disp(' -------------------------------------------');
writematrix(out_hnz,fullfile(out_folder,'all_HNZ.txt'));
disp(' HNZ files processed according to this list :');
disp(out_hnz_filename)
disp(' -------------------------------------------');
in the command window you will see the order of processing the files :
HNE files processed according to this list :
20211104-041733-C100-HNE.txt
20211104-041733-C200-HNE.txt
20211104-041740-C330-HNE.txt
20211104-041751-GO04-HNE.txt
20211104-041752-C160-HNE.txt
-------------------------------------------
HNN files processed according to this list :
20211104-041733-C100-HNN.txt
20211104-041733-C200-HNN.txt
20211104-041740-C330-HNN.txt
20211104-041751-G004-HNN.txt
20211104-041752-C160-HNN.txt
-------------------------------------------
HNZ files processed according to this list :
20211104-041733-C100-HNZ.txt
20211104-041733-C200-HNZ.txt
20211104-041740-C330-HNZ.txt
20211104-041751-GO04-HNZ.txt
20211104-041752-C160-HNZ.txt
-------------------------------------------
  1 Comment
Jorge Luis Paredes Estacio
I am using Matlab R2020a. This was the whole error message:
Error using writematrix (line 156)
Unable to open file 'E:\Civil Egineering PhD\Processing data matlab\matlab code\MOTION RECORDS CSN CHILE MEJORADO\out\all_HNE.txt' for writing:
No such file or directory
Error in Read_data_chile_joining (line 39)
writematrix(out_hne,fullfile(out_folder,'all_HNE.txt'));
>>

Sign in to comment.


Jorge Luis Paredes Estacio
Thank you very much. I managed to upload some of the files.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!