Searching for closely related filenames in a directory
Show older comments
I'm working on some efficiency based code which essentially will check if a filename exists, and load the file in if it does. I use structure arrays to store my parameters, so a simplified version of a string looks something like this (more variables, but enough here to demonstrate the point while minimising complexity I hope), with file search logic afterwards:
%String/Filename To Check For:
opts.str.f1 = ['D:\Filepath\M_3_202020_signals_vol_X_']; %Assume this bit is fixed for simplicity.
opts.str.f2 = ['32_Y_32_Z_32_Omega_260.567_no_fields_',num2str(opts.no_fields),'and_25_sensors.mat'];
%Search for the file:
if isfield(For_opts,'sensMstr') && isfile([opts.str.f1,opts.str.f2]); load([opts.str.f1,opts.str.f2]) %Load File
else; %Do Something Else
end
The issue is that sometimes this function is called with a value of "opts.no_fields" <= the value in my savefile.I have logic elsewhere which manages the data loaded in from these files, I'm just seeking a way of loading in an existing file. For example I can manually load in the file with opt.no_fields = 10, set opt.no_fields = 5 and run the rest of my program to get the desired results.
What I would like to do in the loading section is search for the filename and find a match if opts.no_fields is <= the value stored after the 'no_fields_' string, and all the other variables in the filename match. My best hacky approach is:
opts.str.f1 = ['D:\Filepath\M_3_202020_signals_vol_X_']; %Assume this bit is fixed for simplicity.
max_var = 20; %Pass Additional Variable
%Filenames currently all contain the same number of fields, but I want to work with less so this hack works for now.
%This will change in the very near future, hence the need to write some search code..
if opts.no_fields <= max_var; opts.str.f2 = ['32_Y_32_Z_32_Omega_260.567_no_fields_',num2str(max_var),'and_25_sensors.mat'];
else; opts.str.f2 = ['32_Y_32_Z_32_Omega_260.567_no_fields_',num2str(opts.no_fields),'and_25_sensors.mat'];
end
%Search for the file:
if isfield(For_opts,'sensMstr') && isfile([opts.str.f1,opts.str.f2]); load([opts.str.f1,opts.str.f2]) %Load File
else; %Do Something Else
end
I think a combination of isfile and contains maybe able to achieve this, but I am unable to construct a logic which actually runs, mind achieves the intention. Any suggestions would be greatly appreciated.
4 Comments
Matt Gaidica
on 17 Dec 2020
Is there a reason you are not just using regex (or even, strfind) on your filenames and getting the no_fields? I'm slightly confused, perhaps show a simpler example removed from your application.
ADSW121365
on 17 Dec 2020
Matt Gaidica
on 17 Dec 2020
str = '32_Y_32_Z_32_Omega_260.567_no_fields_360_and_25_sensors.mat';
out = regexpi(str,'fields_(fields|\d*)_and_(sensors|\d*)','tokens');
out{1}{1}
out{1}{2}
Results in
ans =
'360'
ans =
'25'
ADSW121365
on 11 Jan 2021
Answers (2)
['32_Y_32_Z_32_Omega_260.567_no_fields_',num2str(opts.no_fields),'and_25_sensors.mat']
Searching such files is complicated, because important information is hidden in the name of the file. This is an inefficient design, because it impedes the addressing. You can compare this with including the phone numbers and the current wight in the name of persons. This makes it much easier to call somebody, if you know his name, but all persons change their names every morning and it will be horrible to update all data bases.
Store data in a data base, because this is the purpose. Either use a professional data base and find the data by SQL queries, or create your own database: Store files with dull names like "file1.mat", "file2.mat", ... and write the important information, which data belong to which parameters either in an extra field inside the file and/or to an extra file. Having this information only in an extra file is fragile, because the correlation between the parameters and the data might be lost, when somebody renames a file. Storing the parameters inside each file only requires to open a lot of files to find a specific dataset. So it is stable, to store the parameters inside the files and copy them to an extra file for a faster searching. Then a lost correlation can be restored by collecting the parameters once agin from the actual data files.
Instead of inventing a smart and flexible method to search file names, use a clear and clean approach to store parameters in a format, which can be searched efficiently.
Using the file names to store parameters causes another problem also, because the Windows Explorer does not handle file names with more than 256 characters including the path. Moving the files or folders in the Windows Explorer fails and you get an error message if you are lucky.
9 Comments
ADSW121365
on 17 Dec 2020
Matt Gaidica
on 17 Dec 2020
Edited: Matt Gaidica
on 17 Dec 2020
I'm not opposed to placing meta data in a filename. It keeps data with data, describes what's inside, and should your master database crash or become corrupted, you might be able to salvage your project.
Best practice? If you're staying in MATLAB, you could just create a table with a few columns for no_fields, sensors, and also filepath and filename. Save it to a MAT-file, then open and append rows when you add files to your project, then re-save. You'll want to back this up regularly.
Image Analyst
on 17 Dec 2020
Edited: Image Analyst
on 17 Dec 2020
Oh, very nice! 😎
I thought they had a built in function for the Levenshtein distance that I mentioned in my answer below but I couldn't remember the function name. And, unfortunately Levenshtein doesn't come up in my help search. Neither does editDistance() - I guess it's not a built-in function. ☹️ Looks like it's in the Text Analytics Toolbox.
Jan
on 18 Dec 2020
@Image Analyst: There must be a built-in function, because Matlab suggests alternatives for typos in the command window.
Image Analyst
on 18 Dec 2020
Edited: Image Analyst
on 18 Dec 2020
Jan, it's built-in but built in to some specific toolbox (the Text Analytics Toolbox) I believe:
>> editDistance
Unrecognized function or variable 'editDistance'.
>> doc editDistance
No results for editDistance in MathWorks and Supplemental Software documentation.
Search tips:
- Check the spelling.
- The search may be too vague—use more specific search terms.
- The search may be too specific—use more general search terms.
- If you used quotes for an exact match, try removing them for a broad match.
@Image Analyst: If I type "help ploz", Matlab suggests the help text of plot(). I stepped through the code of help() in the debugger and found:
candidates = {'bsd', 'Ass', 'csd', 'sad'};
match = matlab.internal.language.errorrecovery.namesuggestion('asd', candidates)
Image Analyst
on 19 Dec 2020
Yes, but nevertheless, I don't have the Text Analytics Toolbox so not even the suggestions for a misspelled editdistance() show up because of that.
Jan
on 20 Dec 2020
I do not have the Text Analytics Toolbox also, but actually I meant, that the built-in function matlab.internal.language.errorrecovery.namesuggestion can be used - not to calculate the edit- or Levenshtein-distance, but to find the nearest matching string.
Image Analyst
on 20 Dec 2020
Oh, ok. Nice tip. Thanks.
Image Analyst
on 17 Dec 2020
0 votes
Maybe the "needle in a haystack" algorithm would work:
or maybe try this:
Categories
Find more on Creating, Deleting, and Querying Graphics Objects in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!