Main Content

Extract Features from Audio Data Sets

Feature extraction is an important part of machine learning and deep learning workflows for audio signals. For these workflows, you often need to train your model using features extracted from a large data set of audio files. Datastores are useful for working with large collections of data, and the audioDatastore object allows you to manage collections of audio files.

This example shows different approaches to extracting features from an audio data set. It also shows how to use parallel computing to accelerate file reading and feature extraction. Parallel file reading and feature extraction requires Parallel Computing Toolbox™.

Create Datastore

Set the useFSDD flag to true to download Free Spoken Digit Dataset (FSDD) [1] containing recordings of spoken digits, and create an audioDatastore object that points to the data. Otherwise, create a datastore with a small set of audio recordings of spoken digits.

Set the OutputDataType property to "single" to read the audio data into single-precision arrays. Deep learning workflows often require single-precision data, and using such data can help to speed up feature extraction. You can also set the OutputEnvironment property to "gpu" to return data on the GPU, which can also speed up feature extraction.

useFSDD = false;
fs = 8000; % sample rate of audio data
if useFSDD
    downloadFolder = matlab.internal.examples.downloadSupportFile("audio","FSDD.zip");
    dataFolder = tempdir;
    unzip(downloadFolder,dataFolder)
    dataset = fullfile(dataFolder,"FSDD","recordings");
    ads = audioDatastore(dataset,OutputDataType="single");
else
    ads = audioDatastore(".",OutputDataType="single");
end

To read files and extract features in parallel in this example, call gcp (Parallel Computing Toolbox) to get the current parallel pool of workers. If no parallel pool exists, gcp starts a new pool.

pool = gcp;
Starting parallel pool (parpool) using the 'Processes' profile ...
Connected to parallel pool with 4 workers.

Use audioFeatureExtractor Object to Extract Features

The simplest way to extract audio features from a data set is to use the audioFeatureExtractor object. Create an audioFeatureExtractor to extract mel-frequency cepstral coefficients (MFCCs) from each audio file. Call extract to extract the MFCCs from each audio file in the datastore. Calling extract on the datastore requires that all the features from the datastore fit in memory.

afe = audioFeatureExtractor(SampleRate=fs,mfcc=true);
mfccs = extract(afe,ads);

To read the files and extract features in parallel, call extract with UseParallel set to true.

mfccs = extract(afe,ads,UseParallel=true);

Extract Features in Loop

Another approach to extract all the features is to loop through the files in the datastore. You might choose this method if you have a custom feature extraction algorithm that you cannot implement with audioFeatureExtractor. In this example, you use stft, which computes the short-time Fourier transform (STFT) of the audio signal.

Create a cell array to contain the features for each file. In a loop, read in the audio file from the datastore and extract the features using stft. Store the features in the cell array. This approach requires the features from the whole data set to fit in memory.

numFiles = length(ads.Files);
features = cell(1,numFiles);
for index = 1:numFiles
    x = read(ads);
    features{index} = stft(x);
end

Use Parallel Loop

You can partition the datastore and run the feature extraction loop on the partitions in parallel to improve performance.

Use numpartitions to get a reasonable number of partitions given the number of files and number of workers in the current parallel pool.

numPartitions = numpartitions(ads,pool);

In a parfor (Parallel Computing Toolbox) loop, partition the datastore and extract the features from the files in each partition.

reset(ads)

partitionFeatures = cell(1,numPartitions);
parfor ii = 1:numPartitions
    subds = partition(ads,numPartitions,ii);
    feats = cell(1,numel(subds.Files));
    for index = 1:numel(subds.Files)
        x = read(subds);
        feats{index} = stft(x);
    end
    partitionFeatures{ii} = feats;
end

Concatenate the features from each partition into one cell array.

features = cat(2,partitionFeatures{:});

Transform Datastore

Another approach for extracting features from the data set is to create a TransformedDatastore that applies custom feature extraction when reading in a file. This method is useful when the features from the whole data set do not fit in memory.

Create a function featureExtraction that takes the audio data from a file in the datastore and performs the feature extraction. Call transform on the audioDatastore with the featureExtraction function handle to create a new datastore that performs the feature extraction.

function feats = featureExtraction(x)
    feats = stft(x);
    feats = {feats}; % return features in cell array because they have variable size
end

tds = transform(ads,@featureExtraction);

Calling read on the new datastore reads in a file and performs feature extraction using the provided function.

fileFeatures = read(tds);

You can also use this method to read all the features from the data set into memory by calling readall with the TransformedDatastore.

features = readall(tds);

Read the files and extract features in parallel by calling readall with UseParallel set to true.

features = readall(tds,UseParallel=true);

Next Steps

You can use the data set features to train a machine learning or deep learning model. Combine the features with label information to perform supervised learning. For example, the trainnet (Deep Learning Toolbox) function allows you to train deep neural networks using labeled datastores.

References

[1] Zohar Jackson, César Souza, Jason Flaks, Yuxin Pan, Hereman Nicolas, and Adhish Thite. “Jakobovski/free-spoken-digit-dataset: V1.0.8”. Zenodo, August 9, 2018. https://doi.org/10.5281/zenodo.1342401.

Related Topics