MATLAB Answers

JLC
0

How can I use a specified order of strings to index from a cell array?

Asked by JLC
on 16 Jun 2019
Latest activity Commented on by Jan
on 19 Jun 2019
I am trying to index from a cell aray of a number of potential reference files to use for a comparison. The comparison files have distinct parts of their file names that I'd like to use to specify a single reference file.
However, I'm only able to return reference files that contain the three distinct parts, in any order. How can I enforce the order?
Example:
I use dir to create a list of all of the reference files in the given directory, which gives me a struct
ref_files =
15×1 struct array with fields:
name
folder
date
bytes
isdir
datenum
I'm interested in the name field:
so I convert to a string array:
ref_files = string(extractfield(ref_files, 'name'))';
Which gives me:
ref_files =
15×1 string array
"OpenEar_female_44k_00dBA_babble7ch_1sp_20k_62dBA_48k.wav"
"OpenEar_female_44k_00dBA_babble7ch_1sp_20k_66dBA_48k.wav"
"OpenEar_female_44k_00dBA_babble7ch_1sp_20k_70dBA_48k.wav"
"OpenEar_female_44k_55dBA_babble7ch_1sp_20k_00dBA_48k.wav"
"OpenEar_female_44k_65dBA_babble7ch_1sp_20k_00dBA_48k.wav"
"OpenEar_female_44k_70dBA_babble7ch_1sp_20k_00dBA_48k.wav"
"OpenEar_female_44k_70dBA_babble7ch_1sp_20k_62dBA_48k.wav"
"OpenEar_female_44k_70dBA_babble7ch_1sp_20k_66dBA_48k.wav"
"OpenEar_female_44k_70dBA_babble7ch_1sp_20k_70dBA_48k.wav"
"OpenEar_short_44k_55dBA_babble7ch_1sp_20k_00dBA_48k.wav"
"OpenEar_short_44k_65dBA_babble7ch_1sp_20k_00dBA_48k.wav"
"OpenEar_short_44k_70dBA_babble7ch_1sp_20k_00dBA_48k.wav"
"OpenEar_short_44k_70dBA_babble7ch_1sp_20k_62dBA_48k.wav"
"OpenEar_short_44k_70dBA_babble7ch_1sp_20k_66dBA_48k.wav"
"OpenEar_2short_44k_70dBA_babble7ch_1sp_20k_70dBA_48k.wav"
I have been converting to a cell array with:
ref_files = arrayfun(@(x)char(ref_files(x)),1:numel(ref_files),'uni',false)';
The comparison files (deg_files) are also a struct.
deg_files.name
ans =
'Deg1_female_44k_55dBA_babble7ch_1sp_20k_00dBA_48k.wav'
ans =
'Deg2_female_44k_55dBA_babble7ch_1sp_20k_00dBA_48k.wav'
ans =
'Deg3_female_44k_70dBA_babble7ch_1sp_20k_00dBA_48k.wav'
What I'm trying to do is loop through the comparison files, find the corresponding reference file and then pass those on to another function as arguments:
for i = 1:length(deg_files)
[deg_p, deg_baseFileName] = fileparts(deg_files(i).name); %Remove file extension
deg_fullFileName = fullfile(deg_input_directory,[deg_baseFileName '.wav']); %Create full path to file
%[HA, deg_Date, Test, Talker, Sp_Fs, Sp_Lvl, Ns_Type, Ns_Sp, Ns_Fs, Ns_Lvl, Fs] = strsplit(deg_baseFileName, "_");
deg_parts = strsplit(deg_baseFileName, "_");
[deg, fs_deg] = audioread(deg_fullFileName);
% Set current hearing aid name from file name
HA = char(deg_parts(1));
% Set talker from file name
Talker = char(deg_parts(2));
% Set speech level from file name
Sp_Lvl = char(deg_parts(4));
Sp_Lvl = str2num(Sp_Lvl(1:end-3)); % Drop "dBA" and convert to number
% Set noise level from file name
Ns_Lvl = char(deg_parts(8));
Ns_Lvl = str2num(Ns_Lvl(1:end-3)); % Drop "dBA" and convert to number
% Calculate SNR for current comparison
SNR = Sp_Lvl-Ns_Lvl;
The distinguishing parts are:
deg_parts(2), deg_parts(4), deg_parts(8)
In this case: "female", "70dBA", "00dBA" - in that order
I am attempting to find the corresponding reference file and read it in:
% Find and read in corresponding Reference recording for current HA
% recording
strToFind = {string(deg_parts(2)),string(deg_parts(4)),string(deg_parts(108))}'; % Strings to match
fun = @(s)~cellfun('isempty',strfind(ref_files,s));
out = cellfun(fun,strToFind,'UniformOutput',false);
idx = all(horzcat(out{:}),2);
ref_file_idx = string(ref_files(idx));
ref_fullFileName = fullfile(ref_input_directory,[ref_file_idx(1)]);
% Read in Reference recording
[ref_p, ref_baseFileName] = fileparts(ref_fullFileName);
ref_parts = strsplit(ref_baseFileName, "_");
% Read in Reference recording
[ref, fs_ref] = audioread(ref_fullFileName);
However, the index returns two values from my reference file cell array:
Ref_female_44k_00dBA_babble7ch_1sp_20k_70dBA_48k.wav
Ref_female_44k_70dBA_babble7ch_1sp_20k_00dBA_48k.wav
Both contain the distinguishing parts, but only the second in the correct order.
Is there a way I can enforce the order in my out call?
Thanks!
Edited to provide reproducible example

  10 Comments

I'd prefer staying at cell strings (which are not strings, but "cells of char vectors") here. This would reduce the complexity of the code substantially.
@per isakson: Does this mean that you'd prefer to use strings consequently, or would you mix strings, cell strings and char vectors as in the original code?

Sign in to comment.

Products


Release

R2018a

1 Answer

Answer by per isakson
on 16 Jun 2019
Edited by per isakson
on 17 Jun 2019

Run section by section
%%
deg_baseFileName = "Test1_female_44k_70dBA_babble7ch_1sp_20k_00dBA_48k";
%%
deg_parts = strsplit(deg_baseFileName, "_");
%%
deg_parts(2), deg_parts(4), deg_parts(8)
%%
xpr = sprintf( '_%s_.+_%s_.+_%s_', deg_parts([2,4,8]) );
%%
Ref = ["Ref_female_44k_00dBA_babble7ch_1sp_20k_70dBA_48k.wav"
"Ref_female_44k_70dBA_babble7ch_1sp_20k_00dBA_48k.wav" ];
%%
cac = regexp( Ref, xpr, 'once' );
has = not( cellfun( @isempty, cac ) );
and peek on the result
>> has
has =
2×1 logical array
0
1
>> Ref(has)
ans =
"Ref_female_44k_70dBA_babble7ch_1sp_20k_00dBA_48k.wav"
>>
Only the one with the "order" of deg_baseFileName is picked.
In response to comment and "Edited to provide reproducible example"
I assume that "However, the index returns two values from my reference file cell array:[...]" is the key sentence describing the problem.
This script matches comparison files with the correct reference files
%%
ref_files = [
"OpenEar_female_44k_00dBA_babble7ch_1sp_20k_62dBA_48k.wav"
"OpenEar_female_44k_00dBA_babble7ch_1sp_20k_66dBA_48k.wav"
"OpenEar_female_44k_00dBA_babble7ch_1sp_20k_70dBA_48k.wav"
"OpenEar_female_44k_55dBA_babble7ch_1sp_20k_00dBA_48k.wav"
"OpenEar_female_44k_65dBA_babble7ch_1sp_20k_00dBA_48k.wav"
"OpenEar_female_44k_70dBA_babble7ch_1sp_20k_00dBA_48k.wav"
"OpenEar_female_44k_70dBA_babble7ch_1sp_20k_62dBA_48k.wav"
"OpenEar_female_44k_70dBA_babble7ch_1sp_20k_66dBA_48k.wav"
"OpenEar_female_44k_70dBA_babble7ch_1sp_20k_70dBA_48k.wav"
"OpenEar_short_44k_55dBA_babble7ch_1sp_20k_00dBA_48k.wav"
"OpenEar_short_44k_65dBA_babble7ch_1sp_20k_00dBA_48k.wav"
"OpenEar_short_44k_70dBA_babble7ch_1sp_20k_00dBA_48k.wav"
"OpenEar_short_44k_70dBA_babble7ch_1sp_20k_62dBA_48k.wav"
"OpenEar_short_44k_70dBA_babble7ch_1sp_20k_66dBA_48k.wav"
"OpenEar_2short_44k_70dBA_babble7ch_1sp_20k_70dBA_48k.wav" ];
%%
deg_files(1).name = "Deg1_female_44k_55dBA_babble7ch_1sp_20k_00dBA_48k.wav";
deg_files(2).name = "Deg2_female_44k_55dBA_babble7ch_1sp_20k_00dBA_48k.wav";
deg_files(3).name = "Deg3_female_44k_70dBA_babble7ch_1sp_20k_00dBA_48k.wav";
%%
for jj = 1 : length( deg_files )
deg_parts = strsplit( deg_files(jj).name, "_" );
xpr = sprintf( "_%s_.+_%s_.+_%s_", deg_parts([2,4,8]) );
cac = regexp( ref_files, xpr );
has = not( cellfun( @isempty, cac ) );
fprintf( '\ndeg_file: %s\n', deg_files(jj).name );
fprintf( 'ref_file: %s\n', ref_files( has ) );
end
output
deg_file: Deg1_female_44k_55dBA_babble7ch_1sp_20k_00dBA_48k.wav
ref_file: OpenEar_female_44k_55dBA_babble7ch_1sp_20k_00dBA_48k.wav
deg_file: Deg2_female_44k_55dBA_babble7ch_1sp_20k_00dBA_48k.wav
ref_file: OpenEar_female_44k_55dBA_babble7ch_1sp_20k_00dBA_48k.wav
deg_file: Deg3_female_44k_70dBA_babble7ch_1sp_20k_00dBA_48k.wav
ref_file: OpenEar_female_44k_70dBA_babble7ch_1sp_20k_00dBA_48k.wav
>>

  5 Comments

The two scripts of my answer runs without problems on my Win10, R2018b.
Furthermore, there is no cell array created or used in my scripts. Correction: cac in my first script is a cell array, however, it's not used together with fprintf
I also tried this:
for jj = 1 : length( deg_files )
deg_parts = strsplit( deg_files(jj).name, "_" );
xpr = compose( "_%s_.+_%s_.+_%s_", deg_parts([4,6,10]) );
cac = regexp( ref_files, xpr );
has = not( cellfun( @isempty, cac ) );
refFile = compose('%s\n', ref_files( has ));
ref_fullFileName = fullfile(ref_input_directory,refFile);
end
But got another error:
Error using compose
Function is not defined for 'cell' inputs.
I run your for-loop-script
  • put the script in a %%-section
  • replaced deg_parts([4,6,10]) by deg_parts([4,6,8])
  • run the section
  • received an error message
Undefined function or variable 'ref_input_directory'.
Error in Untitled2 (line 24)
ref_fullFileName = fullfile(ref_input_directory,refFile);
which shouldn't surprice, since I have not defined ref_input_directory. However, compose() didn't throw any error.
Comments
The MathWorks is in a process (over many releases) of introducing the data type, string, and (I guess) make cell arrays of character vectors obsolete. This makes our communication difficult. The total thread contains code fragments, which are not always compatible. One cannot combine the fragments at will. You encounter an error because in your case the value of ref_files is a cell array of character vectors. I don't because in my case the value of ref_files is a string array as declared in my answer.
To me the key sentence in your question is "Is there a way I can enforce the order in my out call?". I tried to provide a clean code that does that.

Sign in to comment.