Not sure if I set up this neural network correctly

Question

Saketh Medicherla on 25 Dec 2020

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/702077-not-sure-if-i-set-up-this-neural-network-correctly

Edited: Brian Hemmat on 6 Jan 2021

Below is my code as well as the information about the variables for a basic audio classification problem, which is reading an audio file and distinguishing whether the signal is a car horn or a dog barking. I followed the same format as this tutorial I found: https://www.mathworks.com/help/audio/gs/classify-sound-using-deep-learning.html.

I'm not sure where I went wrong, but when training the program did not plot the loss value. And when I tried to test a sample file, the result was "<undefined>". I would appreciate any help on this.

% --------------------------------------------------------------
% Loading Training and Evaluation Sets for Car Horn and Dog Bark
% --------------------------------------------------------------
carDataStore = UrbanSound8K(UrbanSound8K.class == "car_horn",:);
carDataStore = carDataStore(carDataStore.salience == 1,:);
dogDataStore = UrbanSound8K(UrbanSound8K.class == "dog_bark",:);
dogDataStore = dogDataStore(dogDataStore.salience == 1,:);
carData = [];
dogData = [];
% Add first 2 seconds of each audiofile to their respective matrices and
% produce labels
for i = 1:height(carDataStore)
    thisfile = "UrbanSound8K\audio\fold" + string(carDataStore(i,:).fold) + "\" + string(carDataStore(i,:).slice_file_name);
    if audioinfo(thisfile).Duration >= 2 && audioinfo(thisfile).SampleRate == 44100
        [y,fs] = audioread(thisfile);
        samples = [1,2*fs];
        clear y fs;
        [y,fs] = audioread(thisfile, samples);
        carData = [carData,y(:,1)];
    end
end
carLabels = repelem(categorical("car horn"),width(carData),1);
for i = 1:height(dogDataStore)
    thisfile = "UrbanSound8K\audio\fold" + string(dogDataStore(i,:).fold) + "\" + string(dogDataStore(i,:).slice_file_name);
    if audioinfo(thisfile).Duration >= 2 && audioinfo(thisfile).SampleRate == 44100
        [y,fs] = audioread(thisfile);
        samples = [1,2*fs];
        clear y fs;
        [y,fs] = audioread(thisfile, samples);
        dogData = [dogData,y(:,1)];
    end
end
dogLabels = repelem(categorical("dog barking"),width(dogData),1);
dogVals = round(0.8*width(dogData));
carVals = round(0.8*width(carData));
audioTrain = [dogData(:,1:dogVals),carData(:,1:carVals)];
labelsTrain = [dogLabels(1:dogVals);carLabels(1:carVals)];
audioValidation = [dogData(:,(dogVals + 1):end),carData(:,(carVals + 1):end)];
labelsValidation = [dogLabels((dogVals + 1):end);carLabels((carVals + 1):end)];
% ---------------------------------------------------------
% Audio Feature Extractor to reduce dimensionality of audio,
% Extracting slope and centroid of mel spectrum over time
% ---------------------------------------------------------
aFE = audioFeatureExtractor("SampleRate",fs, ...
    "SpectralDescriptorInput","melSpectrum", ...
    "spectralCentroid",true, ...
    "spectralSlope",true);
featuresTrain = extract(aFE,audioTrain);
[numHopsPerSequence,numFeatures,numSignals] = size(featuresTrain);
featuresTrain = permute(featuresTrain,[2,1,3]);
featuresTrain = squeeze(num2cell(featuresTrain,[1,2]));
numSignals = numel(featuresTrain);
[numFeatures,numHopsPerSequence] = size(featuresTrain{1});
featuresValidation = extract(aFE,audioValidation);
featuresValidation = permute(featuresValidation,[2,1,3]);
featuresValidation = squeeze(num2cell(featuresValidation,[1,2]));
% ----------------------------------------
% Defining the Neural Network Architecture
% ----------------------------------------
layers = [ ...
    sequenceInputLayer(numFeatures)
    lstmLayer(50,"OutputMode","last")
    fullyConnectedLayer(numel(unique(labelsTrain)))
    softmaxLayer
    classificationLayer];
options = trainingOptions("adam", ...
    "Shuffle","every-epoch", ...
    "ValidationData",{featuresValidation,labelsValidation}, ...
    "Plots","training-progress", ...
    "Verbose",false);
net = trainNetwork(featuresTrain,labelsTrain,layers,options);

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Brian Hemmat on 28 Dec 2020

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/702077-not-sure-if-i-set-up-this-neural-network-correctly#answer_585832

Edited: Brian Hemmat on 28 Dec 2020

Open in MATLAB Online

Hi Saketh,

I believe the example you're following is more of a 'hello-world' type example--your current code is trying to accomplish something more difficult. You'll probably need to extract features with more information, and depending on your end goal, also apply standardization.

Regarding your particular questions and why the network is not working, its difficult to say without being able to walk through your code (which would require access to that dataset which I don't have).

Below, I've written something that is similar to your code but using the ESC-10 dataset, which can be downloaded from mathworks support files. Hopefully reading through it will help with your current problem.

I changed the features extracted to mfcc the delta and delta-delta mfcc. The dataset does not have car sounds, so we're doing "dog" and "helicopter" instead. Instead of doing any trimming of the signal, we pass in cell arrays of features and tell the network how to trim the signals if they're not the same size. The amount of training and validation data is tiny, so we'll reduce the validation frequency to make sure validation data is plotted (this might be a similar issue to why you're not seeing loss).

% Download dataset
url = 'https://ssd.mathworks.com/supportfiles/audio/ESC-10.zip';
outputLocation = tempdir;
unzip(url,outputLocation)
% Create audioDatastore to point to dataset. Use the folder names as the
% labels.
esc10Datastore = audioDatastore(fullfile(outputLocation,'ESC-10'), ...
    'IncludeSubfolders',true,'LabelSource','foldernames');
% Subset to only include 'dog' and 'helicopter' labels.
ads = subset(esc10Datastore,esc10Datastore.Labels==categorical("dog") | ...
    esc10Datastore.Labels==categorical("helicopter"));
% Split the datastore into train and validation sets.
[adsTrain,adsValidation] = splitEachLabel(ads,0.8);
% Read a single signal from the train datastore and listen to it.
[audioIn,audioInfo] = read(adsTrain);
fs = audioInfo.SampleRate;
sound(audioIn,fs)
% Create an audioFeatureExtractor
aFE = audioFeatureExtractor("SampleRate",fs, ...
    "mfcc",true, ...
    "mfccDelta",true, ...
    "mfccDeltaDelta",true);
% Get the number of features output per signal
features = extract(aFE,audioIn);
[numHops,numFeatures] = size(features);
% Read all audio data into memory
dataTrain = readall(adsTrain);
labelsTrain = removecats(adsTrain.Labels); %remove empty categories
dataValidation = readall(adsValidation);
labelsValidation = removecats(adsValidation.Labels);
% Extract features from all the data (assume the entire dataset uses the same sample rate (44.1 kHz).
featuresTrain = cellfun(@(x)(extract(aFE,x))',dataTrain,'UniformOutput',false);
featuresValidation = cellfun(@(x)(extract(aFE,x))',dataValidation,'UniformOutput',false);
% Define the architecture
layers = [ ...
    sequenceInputLayer(numFeatures)
    lstmLayer(100,"OutputMode","last") %< increased number of hidden units
    fullyConnectedLayer(numel(unique(labelsTrain)))
    softmaxLayer
    classificationLayer];
% Define the training options
options = trainingOptions("adam", ...
    "Shuffle","every-epoch", ...
    "ValidationData",{featuresValidation,labelsValidation}, ...
    "Plots","training-progress", ...
    "Verbose",false, ...
    "SequenceLength","shortest", ...%<--Specify the sequence length (try experimenting with different options)
    "ValidationFrequency",20);
% Train the network
net = trainNetwork(featuresTrain,labelsTrain,layers,options);

% Evaluate performance on the validation set
y = classify(net,featuresValidation);
accuracy = mean(y==labelsValidation);
cm = confusionchart(labelsValidation,y);
cm.Title = sprintf('Confusion Matrix for Validation Data (Accuracy = %0.2f)',accuracy);
cm.ColumnSummary = 'column-normalized';
cm.RowSummary = 'row-normalized';

2 Comments
Show NoneHide None

Saketh Medicherla on 5 Jan 2021

Thank you for your answer! I have another quick question: Is it necessary to have the same number of audio files for each category to achieve as high an accuracy as possible? I've tested the approach you have provided above with the database I am using (UrbanSound8K), and I'm seeing results of around 75-80% accuracy. I'm assuming this is due to the discrepancy of the available files (645 dog barking, 153 car horn), but I am not completely sure and would appreciate your input.

Brian Hemmat on 5 Jan 2021

Edited: Brian Hemmat on 6 Jan 2021

Hi Saketh,

You'll generally receive the best results if you train using a balanced class distribution. But that's just one of many contributing factors to accuracy.

One approach to dealing with unbalanced class distributions is to use a weighted classification layer. Speech Command Recognition Using Deep Learning uses a weighted classification layer. It's a custom layer and a bit of an advanced maneuver. Also, the example uses a CNN, and I'm not positive a weighted classification layer will improve performance on an LSTM network as well.

Another approach would be to augment your dataset using audioDataAugmenter.

Another approach is to use a pretrained network. You could use something like classifySound off-the-shelf, or you could use the underlying YAMNet network and perform transfer learning for your specific task, as in this example: Transfer Learning Using YAMNet.

One other thing to keep in mind: In the code example I provided previously, I created the validation set as a percentage (20%) of the entire data set. This assumed that that the classes are roughly balanced. Usually, if you have unbalanced classes for training, you'll still want balanced classes for validation/testing to get a fair assessment (although this depends on your final application and desired performance). You can use splitEachLabel and specify the number of files to create balanced validation or test sets: Split by Number of Files.

Good luck!

Sign in to comment.

Answer 2

Anshika Chaurasia on 29 Dec 2020

1
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/702077-not-sure-if-i-set-up-this-neural-network-correctly#answer_586142

Hi Saketh,

You can also refer to Classify Urban Sound using Machine Learning & Deep Learning file containing a script to classify Urban Sound 8K dataset using Wavelet Analysis and Deep Learning.

Note: Classify Urban Sound using Machine Learning & Deep Learning is one of the several submissions in MATLAB File Exchange on MATLAB Central which is a forum for our product users to interact, exchange information and knowledge, without MathWorks' involvement. Feel free to contact the author of this submission directly for specific questions about the implementation.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Not sure if I set up this neural network correctly

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

Not sure if I set up this neural network correctly

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None

0 Comments
Show -2 older commentsHide -2 older comments