How do I find a folder with a specified string?

Question

Daniel Bridges on 30 Jan 2018

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/379746-how-do-i-find-a-folder-with-a-specified-string

Edited: Stephen23 on 31 Jan 2018

I think I need your help using regexp: My goal is to find the RTPLAN DICOM file and read particular metadata from it. Trying to get the full folder name to use in fullfile to use in dicominfo, I tried the following which failed with an error I don't understand:

>> result = regexp(listing.name,'RTPLAN','match')
Error using regexp
Invalid option for regexp:
doe^john_anon53250_ct_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n132__00000.

The folder containing the string 'RTPLAN' clearly exists as the penultimate entry in the following directory listing: Exporting anonymized patient data from MIM Maestro we get

>> DICOMdatafolder = '/home/sony/Documents/research/data/DICOMfiles/5'; 
 listing = dir(DICOMdatafolder);
 listing.name
ans =
      '.'
ans =
      '..'
ans =
      'DOE^JOHN_ANON53250_CT_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n132__00000'
ans =
      'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000'
ans =
      'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00001'
ans =
      'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00002'
ans =
      'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00003'
ans =
      'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00004'
ans =
      'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00005'
ans =
      'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00006'
ans =
      'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00007'
ans =
      'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00008'
ans =
      'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00009'
ans =
      'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__0000A'
ans =
      'DOE^JOHN_ANON53250_RTPLAN_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000'
ans =
      'DOE^JOHN_ANON53250_RTst_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000'

So the task I'm trying to accomplish is: Given a list of folder names like this, grab the one that contains 'RTPLAN' so it can be used in fullfile. What was wrong with my use of regexp?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Stephen23 on 30 Jan 2018

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/379746-how-do-i-find-a-folder-with-a-specified-string#answer_302499

Edited: Stephen23 on 30 Jan 2018

Open in MATLAB Online

The problem is that listing.name expands to a comma-separated list, so your code

regexp(listing.name,'RTPLAN','match')

is exactly equivalent to

regexp(listing(1).name, listing(2).name, listing(3).name, listing(4).name, listing(5).name, listing(6).name, ... , 'RTPLAN','match')

where each element of the structure listing supplies one name field as an input argument to regexp: this clearly produces far too many inputs for regexp, and those inputs are supplied in meaningless positions as well, thus the error.

Comma-separated lists were introduced in my answer to your earlier question:

https://www.mathworks.com/matlabcentral/answers/379087-why-does-dir-name-output-have-two-answers-but-numel-reports-only-one-element

The solution is to put all of those elements of that list into one cell array, e.g.:

result = regexp({listing.name},'RTPLAN','match')

where

{listing.name}

is of course equivalent to

{listing(1).name, listing(2).name, listing(3).name, ...}

This is explained in the MATLAB documentation that I linked to in my earlier answer. I would recommend reviewing what comma-separated lists are, because judging by your other question they are causing you some confusion (in particular comma-separated lists are not one variable). You might like to start here:

https://www.mathworks.com/matlabcentral/answers/320713-how-to-operate-on-comma-separated-lists#answer_250868

4 Comments
Show 2 older commentsHide 2 older comments

Daniel Bridges on 31 Jan 2018

Edited: Daniel Bridges on 31 Jan 2018

Open in MATLAB Online

I am still struggling to elegantly obtain the entire string. find cannot be used with cell arrays, and seemingly must be used with matrices; cell2mat collapses the cell array resulting from regexp losing the information about which directory contains the matching string. Looking again at the regexp documentation, I have not yet found how to pull the entire string containing 'RTPLAN'. I think I should use isempty to get a result to feed into dir's output but must learn the syntax for dealing with cells.

I was trying to avoid a for loop (I seem to always use them and I am concerned it isn't making use of MATLAB's indexing), but this works:

 listing = dir(DICOMdatafolder);
 result = regexp({listing.name},'RTPLAN');
 for loop = 1:numel(listing)
     if ~isempty(result{loop})
         correctfolder = loop;
     end
 end
 listing(correctfolder).name

Stephen23 on 31 Jan 2018

Edited: Stephen23 on 31 Jan 2018

Open in MATLAB Online

Because in this case the input to regexp is a cell array of strings the output is a cell array of the same size: one of the cells would be non-empty (containing either the matching string, the substring, or its index, depending on what output you select, and assuming one matched filename). You would then have to do some post-processing to get the contents of that one cell, such as checking which cell is empty to generate a logical index:

>> C = {listing.name};
>> idx = ~cellfun('isempty',regexp(C,'RTPLAN','once'));
>> C{idx}
ans = DOE^JOHN_ANON53250_RTPLAN_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000

However for matching such a simple substring regexp is overkill: here are two ways to match that filename, based on faster strfind:

From cell array:

>> C = {listing.name};
>> idx = ~cellfun('isempty',strfind(C,'RTPLAN' ));
>> C{idx}
ans = DOE^JOHN_ANON53250_RTPLAN_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000

From structure:

>> idx = ~cellfun('isempty',strfind({listing.name},'RTPLAN' ));
>> listing(idx).name
ans = DOE^JOHN_ANON53250_RTPLAN_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000

To which you should also add some error checking (otherwise the last step could produce multiple variables in a comma-separated list), so whichever one you choose put this immediately after idx is defined:

assert(nnz(idx)==1,'less than or more than one file found')

Sign in to comment.

How do I find a folder with a specified string?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

4 Comments
Show 2 older commentsHide 2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

How do I find a folder with a specified string?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

4 Comments Show 2 older commentsHide 2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

4 Comments
Show 2 older commentsHide 2 older comments