Main Content


Extract audio features



features = extract(aFE,audioIn) returns an array containing features of the audio input.


collapse all

Read in an audio signal.

[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");

Create an audioFeatureExtractor to extract the centroid of the Bark spectrum, the kurtosis of the Bark spectrum, and the pitch of an audio signal.

aFE = audioFeatureExtractor("SampleRate",fs, ...
    "SpectralDescriptorInput","barkSpectrum", ...
    "spectralCentroid",true, ...
    "spectralKurtosis",true, ...
aFE = 
  audioFeatureExtractor with properties:

                     Window: [1024x1 double]
              OverlapLength: 512
                 SampleRate: 44100
                  FFTLength: []
    SpectralDescriptorInput: 'barkSpectrum'

   Enabled Features
     spectralCentroid, spectralKurtosis, pitch

   Disabled Features
     linearSpectrum, melSpectrum, barkSpectrum, erbSpectrum, mfcc, mfccDelta
     mfccDeltaDelta, gtcc, gtccDelta, gtccDeltaDelta, spectralCrest, spectralDecrease
     spectralEntropy, spectralFlatness, spectralFlux, spectralRolloffPoint, spectralSkewness, spectralSlope
     spectralSpread, harmonicRatio

   To extract a feature, set the corresponding property to true.
   For example, obj.mfcc = true, adds mfcc to the list of enabled features.

Call extract to extract the features from the audio signal. Normalize the features by their mean and standard deviation.

features = extract(aFE,audioIn);
features = (features - mean(features,1))./std(features,[],1);

Plot the normalized features over time.

idx = info(aFE);
duration = size(audioIn,1)/fs;

t = linspace(0,duration,size(audioIn,1));

t = linspace(0,duration,size(features,1));
plot(t,features(:,idx.spectralCentroid), ...
     t,features(:,idx.spectralKurtosis), ...
legend("Spectral Centroid","Spectral Kurtosis", "Pitch")
xlabel("Time (s)")

Figure contains 2 axes. Axes 1 contains an object of type line. Axes 2 contains 3 objects of type line. These objects represent Spectral Centroid, Spectral Kurtosis, Pitch.

Input Arguments

collapse all

Input audio, specified as a column vector or matrix of independent channels (columns).

Data Types: single | double

Output Arguments

collapse all

Extracted audio features, returned as an L-by-M-by-N array, where:

  • L –– Number of feature vectors (hops)

  • M –– Number of features extracted per analysis window

  • N –– Number of channels

Data Types: single | double

Introduced in R2019b