Merging specific Excel files according to a string in the filename
Show older comments
After searching through the forums I've found answers relating to how to merge Excel files together, but none that relate specifically to what I wish to do. I have thousands of Excel files, from a sample set of 120 samples, all in one folder. There are thus 120 sets of files which are separated by time and within each set they all have the same particular string in their filename which allows me to identify them, based on the time the measurements in each file start and end. I wish to merge each set of files together into one file but as the number of files in each set is variable, I wonder if there is a way to count the number of occurrences and only merge those occurrences together. hist and unique doesn't work because I have both alphabetical and numerical characters for the identifiers.
What I've done so far is first load all of the files, tabulate the filenames and split up the filenames so that the unique identifiers are isolated. I adapted a
Files=dir('\\fscbeng\Project\Lab_Measurements\Results\20190424-1\*.csv'); % Looks out for Excel spreadsheets specifically
num_files=1:length(Files);
Cell_logs = cell(1,num_files(end));
for r = num_files
filename=([Files(r).folder '\' Files(r).name]);
fileID = fopen(filename,'r');
Cell_logs(r) = textscan(fileID, '%s', 'delimiter', '\r'); % Stores each Excel file in a separate cell
fclose(fileID);
end
Filenames = table({Files.name}.');
for i = 1:size(Filenames,1)
x = cell2mat(Filenames{i,1});
Filenames{i,1} = {x(1:7)};
Filenames{i,2} = {x(8:23)};
Filenames{i,3} = {x(24:25)};
end
Filenames.Properties.VariableNames = {'Sensor_type' 'UID' 'Sensor_position'};
Sensor_position contains the identifier as follows:
Filenames.Sensor_position
ans =
1320×1 cell array
'80'
'80'
'80'
'80'
'80'
'80'
'80'
'80'
'80'
'80'
'80'
'81'
'81'
'81'
'81'
'81'
...
So there are 11 files for each sensor position, so one way I've thought of (but don't know how to fully execute as I haven't done this before) is a for loop with a logical expression where:
v=Filenames.Sensor_position;
for i = 1:length(v)-1
if isequal(v(i), v(i+1))
... % command to continue until the values differ
else
... % the consecutive values have been evaluated to be different so the column of values would be cut here and the files merged up until the particular row number
end
end
Idea being that when the for loop encounters the next value in the column which differs from the previous value it would split it off into a new variable (if I include an else condition) or else earmark each section of the Filenames.Sensor_position so that the file merger only happens for those earmarked files, if that makes any sense. The other option I've thought of is a find command which would process only those files with a particular string (e.g. '80', '81' etc.) but it would require me to specify each sensor position in turn (there are 120 of them) which seems quite clunky to me.
6 Comments
Guillaume
on 25 Apr 2019
I'd recommend you use fullfile to build paths rather than string concatenation with a hardcoded separator.
Your excel files are not excel files if you use textscan to read them. They must be some text files (possibly .csv that Windows by default associate with excel)
The way you construct your table is very awkward, you could use:
FileNames = char({Files.Names}); %convert file names into a 2D char array
FileNames = FileNames(:, 1:25); %only keep the first 25 chars of each name
FileNames = cell2table(mat2cell(FileNames, size(FileNames, 1), [7, 16, 2]), 'VariableNames', {'Sensor_type', 'UID', 'Sensor_position'});
After that I'm a bit unclear on how you want to group your files. Should the content of all files with the same Sensor_type and UID be merged vertically by Sensor_Position order? Or something else?
You probably don't need a loop for that.
Guillaume
on 25 Apr 2019
Oops! That wasn't the intent to have just one row in the table.
FileNames = char({Files.Names}); %convert file names into a 2D char array
FileNames = FileNames(:, 1:25); %only keep the first 25 chars of each name
FileNames = cell2table(mat2cell(FileNames, ones(1, size(FileNames, 1)), [7, 16, 2]), 'VariableNames', {'Sensor_type', 'UID', 'Sensor_position'});
will fix it.
At present, you're reading each file with textscan(...,'%s') so it's unclear which format the files have. Can you attach a sample file (in particular so we can see what form the header takes)?
So you want to group by Sensor_Position. Do the Sensor_type and UID have any impact on the grouping or are they just ignored then?
As said, a loop is most likely not needed. You can just ask matlab to merge all elements of Cell_logs which have the same matching Sensor_Position.
Stanley
on 25 Apr 2019
Guillaume
on 25 Apr 2019
So what does preserving the header mean with such files. What would be the result of merging two files similar to your test.csv?
I would probably use rowfun with the 'GroupBy' option. I'll show you how it's done once I understand what merging actually means.
Accepted Answer
More Answers (0)
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!