How do I find a folder with a specified string?

50 views (last 30 days)
I think I need your help using regexp: My goal is to find the RTPLAN DICOM file and read particular metadata from it. Trying to get the full folder name to use in fullfile to use in dicominfo, I tried the following which failed with an error I don't understand:
>> result = regexp(listing.name,'RTPLAN','match')
Error using regexp
Invalid option for regexp:
doe^john_anon53250_ct_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n132__00000.
The folder containing the string 'RTPLAN' clearly exists as the penultimate entry in the following directory listing: Exporting anonymized patient data from MIM Maestro we get
>> DICOMdatafolder = '/home/sony/Documents/research/data/DICOMfiles/5';
listing = dir(DICOMdatafolder);
listing.name
ans =
'.'
ans =
'..'
ans =
'DOE^JOHN_ANON53250_CT_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n132__00000'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00001'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00002'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00003'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00004'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00005'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00006'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00007'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00008'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00009'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__0000A'
ans =
'DOE^JOHN_ANON53250_RTPLAN_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000'
ans =
'DOE^JOHN_ANON53250_RTst_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000'
So the task I'm trying to accomplish is: Given a list of folder names like this, grab the one that contains 'RTPLAN' so it can be used in fullfile. What was wrong with my use of regexp?

Accepted Answer

Stephen23
Stephen23 on 30 Jan 2018
Edited: Stephen23 on 30 Jan 2018
The problem is that listing.name expands to a comma-separated list, so your code
regexp(listing.name,'RTPLAN','match')
is exactly equivalent to
regexp(listing(1).name, listing(2).name, listing(3).name, listing(4).name, listing(5).name, listing(6).name, ... , 'RTPLAN','match')
where each element of the structure listing supplies one name field as an input argument to regexp: this clearly produces far too many inputs for regexp, and those inputs are supplied in meaningless positions as well, thus the error.
Comma-separated lists were introduced in my answer to your earlier question:
The solution is to put all of those elements of that list into one cell array, e.g.:
result = regexp({listing.name},'RTPLAN','match')
where
{listing.name}
is of course equivalent to
{listing(1).name, listing(2).name, listing(3).name, ...}
This is explained in the MATLAB documentation that I linked to in my earlier answer. I would recommend reviewing what comma-separated lists are, because judging by your other question they are causing you some confusion (in particular comma-separated lists are not one variable). You might like to start here:
  4 Comments
Daniel Bridges
Daniel Bridges on 31 Jan 2018
Edited: Daniel Bridges on 31 Jan 2018
I am still struggling to elegantly obtain the entire string. find cannot be used with cell arrays, and seemingly must be used with matrices; cell2mat collapses the cell array resulting from regexp losing the information about which directory contains the matching string. Looking again at the regexp documentation, I have not yet found how to pull the entire string containing 'RTPLAN'. I think I should use isempty to get a result to feed into dir's output but must learn the syntax for dealing with cells.
I was trying to avoid a for loop (I seem to always use them and I am concerned it isn't making use of MATLAB's indexing), but this works:
listing = dir(DICOMdatafolder);
result = regexp({listing.name},'RTPLAN');
for loop = 1:numel(listing)
if ~isempty(result{loop})
correctfolder = loop;
end
end
listing(correctfolder).name
Stephen23
Stephen23 on 31 Jan 2018
Edited: Stephen23 on 31 Jan 2018
Because in this case the input to regexp is a cell array of strings the output is a cell array of the same size: one of the cells would be non-empty (containing either the matching string, the substring, or its index, depending on what output you select, and assuming one matched filename). You would then have to do some post-processing to get the contents of that one cell, such as checking which cell is empty to generate a logical index:
>> C = {listing.name};
>> idx = ~cellfun('isempty',regexp(C,'RTPLAN','once'));
>> C{idx}
ans = DOE^JOHN_ANON53250_RTPLAN_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000
However for matching such a simple substring regexp is overkill: here are two ways to match that filename, based on faster strfind:
From cell array:
>> C = {listing.name};
>> idx = ~cellfun('isempty',strfind(C,'RTPLAN' ));
>> C{idx}
ans = DOE^JOHN_ANON53250_RTPLAN_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000
From structure:
>> idx = ~cellfun('isempty',strfind({listing.name},'RTPLAN' ));
>> listing(idx).name
ans = DOE^JOHN_ANON53250_RTPLAN_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000
To which you should also add some error checking (otherwise the last step could produce multiple variables in a comma-separated list), so whichever one you choose put this immediately after idx is defined:
assert(nnz(idx)==1,'less than or more than one file found')

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!