How do I find a folder with a specified string?
50 views (last 30 days)
Show older comments
I think I need your help using regexp: My goal is to find the RTPLAN DICOM file and read particular metadata from it. Trying to get the full folder name to use in fullfile to use in dicominfo, I tried the following which failed with an error I don't understand:
>> result = regexp(listing.name,'RTPLAN','match')
Error using regexp
Invalid option for regexp:
doe^john_anon53250_ct_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n132__00000.
The folder containing the string 'RTPLAN' clearly exists as the penultimate entry in the following directory listing: Exporting anonymized patient data from MIM Maestro we get
>> DICOMdatafolder = '/home/sony/Documents/research/data/DICOMfiles/5';
listing = dir(DICOMdatafolder);
listing.name
ans =
'.'
ans =
'..'
ans =
'DOE^JOHN_ANON53250_CT_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n132__00000'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00001'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00002'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00003'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00004'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00005'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00006'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00007'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00008'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00009'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__0000A'
ans =
'DOE^JOHN_ANON53250_RTPLAN_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000'
ans =
'DOE^JOHN_ANON53250_RTst_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000'
So the task I'm trying to accomplish is: Given a list of folder names like this, grab the one that contains 'RTPLAN' so it can be used in fullfile. What was wrong with my use of regexp?
0 Comments
Accepted Answer
Stephen23
on 30 Jan 2018
Edited: Stephen23
on 30 Jan 2018
regexp(listing.name,'RTPLAN','match')
is exactly equivalent to
regexp(listing(1).name, listing(2).name, listing(3).name, listing(4).name, listing(5).name, listing(6).name, ... , 'RTPLAN','match')
where each element of the structure listing supplies one name field as an input argument to regexp: this clearly produces far too many inputs for regexp, and those inputs are supplied in meaningless positions as well, thus the error.
Comma-separated lists were introduced in my answer to your earlier question:
The solution is to put all of those elements of that list into one cell array, e.g.:
result = regexp({listing.name},'RTPLAN','match')
where
{listing.name}
is of course equivalent to
{listing(1).name, listing(2).name, listing(3).name, ...}
This is explained in the MATLAB documentation that I linked to in my earlier answer. I would recommend reviewing what comma-separated lists are, because judging by your other question they are causing you some confusion (in particular comma-separated lists are not one variable). You might like to start here:
4 Comments
Stephen23
on 31 Jan 2018
Edited: Stephen23
on 31 Jan 2018
Because in this case the input to regexp is a cell array of strings the output is a cell array of the same size: one of the cells would be non-empty (containing either the matching string, the substring, or its index, depending on what output you select, and assuming one matched filename). You would then have to do some post-processing to get the contents of that one cell, such as checking which cell is empty to generate a logical index:
>> C = {listing.name};
>> idx = ~cellfun('isempty',regexp(C,'RTPLAN','once'));
>> C{idx}
ans = DOE^JOHN_ANON53250_RTPLAN_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000
However for matching such a simple substring regexp is overkill: here are two ways to match that filename, based on faster strfind:
From cell array:
>> C = {listing.name};
>> idx = ~cellfun('isempty',strfind(C,'RTPLAN' ));
>> C{idx}
ans = DOE^JOHN_ANON53250_RTPLAN_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000
From structure:
>> idx = ~cellfun('isempty',strfind({listing.name},'RTPLAN' ));
>> listing(idx).name
ans = DOE^JOHN_ANON53250_RTPLAN_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000
To which you should also add some error checking (otherwise the last step could produce multiple variables in a comma-separated list), so whichever one you choose put this immediately after idx is defined:
assert(nnz(idx)==1,'less than or more than one file found')
More Answers (0)
See Also
Categories
Find more on String in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!