MFCC
Libraries:
Audio Toolbox /
Features
Description
The MFCC block extracts feature vectors containing the mel-frequency cepstral coefficients (MFCCs), as well as their delta and delta-delta features, from the audio input signal. MFCCs are popular features extracted from speech signals for use in classification tasks.
Examples
Keyword Spotting in Simulink
Use a pretrained deep learning model in Simulink® to identify a keyword in speech.
Ports
Input
Port_1 — Audio input
column vector | matrix
Audio input signal, specified as a column vector or a matrix. When you specify a matrix, the block treats columns as independent audio channels.
Data Types: single
| double
Output
Port_1 — MFCC features
matrix | 3-D array
MFCC features returned as a matrix or 3-D array. The features include the MFCCs themselves and optionally include the delta and delta-delta features of the MFCCs. The dimensions of the output are L-by-M-by-N, where:
L is the number of feature vectors, which is specified by the Number of feature vectors parameter.
M is the number of features in each feature vector, which is determined by the Number of cepstral coefficients, Append delta, and Append delta-delta parameters.
N is the number of channels in the input audio signal.
Trailing dimensions of size 1 are removed from the output.
Data Types: single
| double
Parameters
Window — Analysis window
hamming(1024,'periodic')
(default) | real vector
Analysis window applied to the input signal in the time domain, specified as a real vector.
Overlap length — Number of overlapping samples between adjacent windows
512
(default) | integer in the range [0, windowLength
)
Number of overlapping samples between adjacent windows, specified as an integer in
the range [0, windowLength
), where windowLength
is
the length of the analysis window and is specified by the Window
parameter.
Number of cepstral coefficients — Number of cepstral coefficients in each feature vector
13
(default) | positive integer greater than 1
Number of cepstral coefficients in each feature vector, specified as a positive integer greater than 1.
Rectification — Type of nonlinear rectification
Logarithm
(default) | Cubic root
Type of nonlinear rectification applied to the spectrum prior to the discrete cosine
transform, specified as Logarithm
or Cubic
root
.
Append delta — Append delta of MFCCs to feature vectors
on
(default) | off
When you select this parameter, the block appends the delta of the MFCCs to the coefficients in each feature vector. The delta is an approximation of the first derivative of the MFCCs with respect to time. The number of delta features is equal to the number of MFCCs, which is specified by Number of cepstral coefficients.
Append delta-delta — Append delta-delta of MFCCs to feature vectors
on
(default) | off
When you select this parameter, the block appends the delta-delta of the MFCCs to each output feature vector. The delta-delta is an approximation of the second derivative of the MFCCs with respect to time. The number of delta-delta features is equal to the number of MFCCs, which is specified by Number of cepstral coefficients.
The block appends the delta-delta after the delta in the feature vectors if you also select the Append delta parameter.
Delta window length — Number of coefficients for calculating delta and delta-delta
9
(default) | odd integer greater than 2
Number of coefficients for calculating delta and delta-delta, specified as an odd integer greater than 2.
Number of feature vectors — Number of MFCC feature vectors in output
1
(default) | positive integer
Number of MFCC feature vectors in output, specified as a positive integer. The block buffers the output to return the specified number of feature vectors.
Number of overlapped feature vectors — Number of feature vectors overlapped in output
0
(default) | nonnegative integer
Number of feature vectors the block overlaps in the output, specified as a nonnegative integer less than Number of feature vectors.
Inherit sample rate from input — Specify source of input sample rate
off
(default) | on
When you select this parameter, the block inherits its sample rate from the input signal. When you clear this parameter, you specify the sample rate in the Input sample rate (Hz) parameter.
Input sample rate (Hz) — Sample rate of input
44.1e3
(default) | positive scalar
Input sample rate in Hz, specified as a positive scalar.
Dependencies
To enable this parameter, clear the Inherit sample rate from input parameter.
Number of bands — Number of bands in mel filter bank
32
(default) | positive integer
Number of bands in mel filter bank, specified as a positive integer.
Auto-determine frequency range — Automatically determine frequency range
on
(default) | off
When you select this parameter, the block sets the Frequency
range to [0,fs/2]
, where fs
is the
sample rate. The block determines the sample rate using the Inherit sample
rate from input and Input sample rate (Hz)
parameters.
Frequency range (Hz) — Frequency range of mel filter bank
[0,22050]
(default) | two-element row vector
Frequency range in Hz of mel filter bank, specified as a two-element row vector.
Dependencies
To enable this parameter, clear the Auto-determine frequency range parameter.
Filter bank design domain — Design domain of mel filter bank
linear
(default) | warped
Design domain of mel filter bank, specified as linear
or
warped
.
Filter bank normalization — Normalization technique for filter bank
bandwidth
(default) | area
| none
Normalization technique that the block uses for the filter bank weights, specified
as bandwidth
, area
, or
none
.
bandwidth
–– Normalize the weights of each bandpass filter by the corresponding bandwidth of the filter.area
–– Normalize the weights of each bandpass filter by the corresponding area of the bandpass filter.none
–– The block does not normalize the weights of the filters.
Mel style — Mel style
oshaughnessy
(default) | slaney
Style of the mel scale, specified as oshaughnessy
or
slaney
.
Normalize window — Normalize analysis window
on
(default) | off
When you select this parameter, the block applies window normalization.
Spectrum type — Type of spectrum
power
(default) | magnitude
Type of spectrum, specified as power
or
magnitude
.
Auto-determine FFT length — Automatically determine FFT length
on
(default) | off
When you select this parameter, the block automatically sets the FFT length to the window length. The window length is determined by the Window parameter.
FFT length — Number of DFT points
1024
(default) | positive integer
Number of points used to calculate the DFT, specified as a positive integer.
Dependencies
To enable this parameter, clear the Auto-determine FFT length parameter.
Block Characteristics
Data Types |
|
Direct Feedthrough |
|
Multidimensional Signals |
|
Variable-Size Signals |
|
Zero-Crossing Detection |
|
Algorithms
MFCC
Mel-frequency cepstrum coefficients are popular features extracted from speech signals for use in recognition tasks. In the source-filter model of speech, cepstral coefficients are understood to represent the filter (vocal tract). The vocal tract frequency response is relatively smooth, whereas the source of voiced speech can be modeled as an impulse train. As a result, the vocal tract can be estimated by the spectral envelope of a speech segment.
The motivating idea of mel-frequency cepstral coefficients is to compress information about the vocal tract (smoothed spectrum) into a small number of coefficients based on an understanding of the cochlea. Although there is no hard standard for calculating the coefficients, the basic steps are outlined by the diagram.
Delta
The delta of an audio feature x is a least-squares approximation of the local slope of a region centered on sample x(k), which includes M samples before the current sample and M samples after the current sample.
The delta window length defines the length of the region from –M to M.
Extended Capabilities
C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.
The MFCC block supports optimized code generation using single instruction, multiple data (SIMD) instructions. For more information about SIMD code generation, see Generate SIMD Code from Simulink Blocks for Intel Platforms (Simulink Coder).
Version History
Introduced in R2022bR2023b: Support for Slaney-style mel scale
Set the Mel style parameter to slaney
to
use the Slaney-style mel scale.
R2023a: Generate optimized C/C++ code for computing MFCCs
The MFCC block supports optimized C/C++ code generation using single instruction, multiple data (SIMD) instructions.
See Also
Blocks
Functions
Objects
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)