Searching for closely related filenames in a directory

Question

ADSW121365 on 16 Dec 2020

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/696030-searching-for-closely-related-filenames-in-a-directory

Commented: ADSW121365 on 11 Jan 2021

I'm working on some efficiency based code which essentially will check if a filename exists, and load the file in if it does. I use structure arrays to store my parameters, so a simplified version of a string looks something like this (more variables, but enough here to demonstrate the point while minimising complexity I hope), with file search logic afterwards:

%String/Filename To Check For:
opts.str.f1 = ['D:\Filepath\M_3_202020_signals_vol_X_']; %Assume this bit is fixed for simplicity.
opts.str.f2 = ['32_Y_32_Z_32_Omega_260.567_no_fields_',num2str(opts.no_fields),'and_25_sensors.mat'];
%Search for the file:
if isfield(For_opts,'sensMstr') && isfile([opts.str.f1,opts.str.f2]); load([opts.str.f1,opts.str.f2])  %Load File
else; %Do Something Else
end

The issue is that sometimes this function is called with a value of "opts.no_fields" <= the value in my savefile.I have logic elsewhere which manages the data loaded in from these files, I'm just seeking a way of loading in an existing file. For example I can manually load in the file with opt.no_fields = 10, set opt.no_fields = 5 and run the rest of my program to get the desired results.

What I would like to do in the loading section is search for the filename and find a match if opts.no_fields is <= the value stored after the 'no_fields_' string, and all the other variables in the filename match. My best hacky approach is:

opts.str.f1 = ['D:\Filepath\M_3_202020_signals_vol_X_']; %Assume this bit is fixed for simplicity.
max_var = 20; %Pass Additional Variable  
%Filenames currently all contain the same number of fields, but I want to work with less so this hack works for now.
%This will change in the very near future, hence the need to write some search code..
if opts.no_fields <= max_var;    opts.str.f2 = ['32_Y_32_Z_32_Omega_260.567_no_fields_',num2str(max_var),'and_25_sensors.mat'];
else;                            opts.str.f2 = ['32_Y_32_Z_32_Omega_260.567_no_fields_',num2str(opts.no_fields),'and_25_sensors.mat'];
end
%Search for the file:
if isfield(For_opts,'sensMstr') && isfile([opts.str.f1,opts.str.f2]); load([opts.str.f1,opts.str.f2])  %Load File
else; %Do Something Else
end

I think a combination of isfile and contains maybe able to achieve this, but I am unable to construct a logic which actually runs, mind achieves the intention. Any suggestions would be greatly appreciated.

4 Comments
Show 2 older commentsHide 2 older comments

Matt Gaidica on 17 Dec 2020

Open in MATLAB Online

str = '32_Y_32_Z_32_Omega_260.567_no_fields_360_and_25_sensors.mat';
out = regexpi(str,'fields_(fields|\d*)_and_(sensors|\d*)','tokens');
out{1}{1}
out{1}{2}

Results in

ans =
    '360'
ans =
    '25'

ADSW121365 on 11 Jan 2021

This would be my accepted answer, but is posted as comment unfortunetly.

Sign in to comment.

Sign in to answer this question.

Answer 1

Jan on 17 Dec 2020

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/696030-searching-for-closely-related-filenames-in-a-directory#answer_578095

Edited: Jan on 17 Dec 2020

Open in MATLAB Online

['32_Y_32_Z_32_Omega_260.567_no_fields_',num2str(opts.no_fields),'and_25_sensors.mat']

Searching such files is complicated, because important information is hidden in the name of the file. This is an inefficient design, because it impedes the addressing. You can compare this with including the phone numbers and the current wight in the name of persons. This makes it much easier to call somebody, if you know his name, but all persons change their names every morning and it will be horrible to update all data bases.

Store data in a data base, because this is the purpose. Either use a professional data base and find the data by SQL queries, or create your own database: Store files with dull names like "file1.mat", "file2.mat", ... and write the important information, which data belong to which parameters either in an extra field inside the file and/or to an extra file. Having this information only in an extra file is fragile, because the correlation between the parameters and the data might be lost, when somebody renames a file. Storing the parameters inside each file only requires to open a lot of files to find a specific dataset. So it is stable, to store the parameters inside the files and copy them to an extra file for a faster searching. Then a lost correlation can be restored by collecting the parameters once agin from the actual data files.

Instead of inventing a smart and flexible method to search file names, use a clear and clean approach to store parameters in a format, which can be searched efficiently.

Using the file names to store parameters causes another problem also, because the Windows Explorer does not handle file names with more than 256 characters including the path. Moving the files or folders in the Windows Explorer fails and you get an error message if you are lucky.

9 Comments
Show 7 older commentsHide 7 older comments

ADSW121365 on 17 Dec 2020

I appreciate the insight.

I'm working on a physics PhD with big data and limited computating resources, so this approach is essentially a way the minimise repeated experiments. Each file contains a PDE results solution from the PDE toolbox, which is ~18GB in size. These are also relatively fragile and frequently updated as components of the physical model (not relevent to the str above) change in the FEM model e.g geometry, boundary conditions etc.

All the variables are stored in that structure, but simply loading the 18GB results component into memory so I can check the parameters takes way too long, hence looking for a useful approach which works for the above purpose purely. It might be possible, but I haven't seen a way I can take the results structure and build a database with 1000's of GB of data in there, whilst being stuck with a 1TB HDD and relying on smart cloud storage to actually access all this data in the first place.

Honestly if you can guide me to some useful resources which would help me construct something simpler/more effective I would greatly appreciate it, I'm in a position of I know I'm not following normal practice, but I don't know any of the terminology associated with the programming side of what I'm doing so I can't find any resources to get me moving in the right direction.

Jan on 20 Dec 2020

I do not have the Text Analytics Toolbox also, but actually I meant, that the built-in function matlab.internal.language.errorrecovery.namesuggestion can be used - not to calculate the edit- or Levenshtein-distance, but to find the nearest matching string.