Searching for closely related filenames in a directory

7 views (last 30 days)
I'm working on some efficiency based code which essentially will check if a filename exists, and load the file in if it does. I use structure arrays to store my parameters, so a simplified version of a string looks something like this (more variables, but enough here to demonstrate the point while minimising complexity I hope), with file search logic afterwards:
%String/Filename To Check For:
opts.str.f1 = ['D:\Filepath\M_3_202020_signals_vol_X_']; %Assume this bit is fixed for simplicity.
opts.str.f2 = ['32_Y_32_Z_32_Omega_260.567_no_fields_',num2str(opts.no_fields),'and_25_sensors.mat'];
%Search for the file:
if isfield(For_opts,'sensMstr') && isfile([opts.str.f1,opts.str.f2]); load([opts.str.f1,opts.str.f2]) %Load File
else; %Do Something Else
end
The issue is that sometimes this function is called with a value of "opts.no_fields" <= the value in my savefile.I have logic elsewhere which manages the data loaded in from these files, I'm just seeking a way of loading in an existing file. For example I can manually load in the file with opt.no_fields = 10, set opt.no_fields = 5 and run the rest of my program to get the desired results.
What I would like to do in the loading section is search for the filename and find a match if opts.no_fields is <= the value stored after the 'no_fields_' string, and all the other variables in the filename match. My best hacky approach is:
opts.str.f1 = ['D:\Filepath\M_3_202020_signals_vol_X_']; %Assume this bit is fixed for simplicity.
max_var = 20; %Pass Additional Variable
%Filenames currently all contain the same number of fields, but I want to work with less so this hack works for now.
%This will change in the very near future, hence the need to write some search code..
if opts.no_fields <= max_var; opts.str.f2 = ['32_Y_32_Z_32_Omega_260.567_no_fields_',num2str(max_var),'and_25_sensors.mat'];
else; opts.str.f2 = ['32_Y_32_Z_32_Omega_260.567_no_fields_',num2str(opts.no_fields),'and_25_sensors.mat'];
end
%Search for the file:
if isfield(For_opts,'sensMstr') && isfile([opts.str.f1,opts.str.f2]); load([opts.str.f1,opts.str.f2]) %Load File
else; %Do Something Else
end
I think a combination of isfile and contains maybe able to achieve this, but I am unable to construct a logic which actually runs, mind achieves the intention. Any suggestions would be greatly appreciated.
  4 Comments
Matt Gaidica
Matt Gaidica on 17 Dec 2020
str = '32_Y_32_Z_32_Omega_260.567_no_fields_360_and_25_sensors.mat';
out = regexpi(str,'fields_(fields|\d*)_and_(sensors|\d*)','tokens');
out{1}{1}
out{1}{2}
Results in
ans =
'360'
ans =
'25'
ADSW121365
ADSW121365 on 11 Jan 2021
This would be my accepted answer, but is posted as comment unfortunetly.

Sign in to comment.

Answers (2)

Jan
Jan on 17 Dec 2020
Edited: Jan on 17 Dec 2020
['32_Y_32_Z_32_Omega_260.567_no_fields_',num2str(opts.no_fields),'and_25_sensors.mat']
Searching such files is complicated, because important information is hidden in the name of the file. This is an inefficient design, because it impedes the addressing. You can compare this with including the phone numbers and the current wight in the name of persons. This makes it much easier to call somebody, if you know his name, but all persons change their names every morning and it will be horrible to update all data bases.
Store data in a data base, because this is the purpose. Either use a professional data base and find the data by SQL queries, or create your own database: Store files with dull names like "file1.mat", "file2.mat", ... and write the important information, which data belong to which parameters either in an extra field inside the file and/or to an extra file. Having this information only in an extra file is fragile, because the correlation between the parameters and the data might be lost, when somebody renames a file. Storing the parameters inside each file only requires to open a lot of files to find a specific dataset. So it is stable, to store the parameters inside the files and copy them to an extra file for a faster searching. Then a lost correlation can be restored by collecting the parameters once agin from the actual data files.
Instead of inventing a smart and flexible method to search file names, use a clear and clean approach to store parameters in a format, which can be searched efficiently.
Using the file names to store parameters causes another problem also, because the Windows Explorer does not handle file names with more than 256 characters including the path. Moving the files or folders in the Windows Explorer fails and you get an error message if you are lucky.
  9 Comments
Jan
Jan on 20 Dec 2020
I do not have the Text Analytics Toolbox also, but actually I meant, that the built-in function matlab.internal.language.errorrecovery.namesuggestion can be used - not to calculate the edit- or Levenshtein-distance, but to find the nearest matching string.

Sign in to comment.


Image Analyst
Image Analyst on 17 Dec 2020

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!