Main Content

Machine Learning and Deep Learning for Audio

Dataset management, labeling, and augmentation; segmentation and feature extraction for audio, speech, and acoustic applications

Audio Toolbox™ provides functionality to develop machine and deep learning solutions for audio, speech, and acoustic applications including speaker identification, speech command recognition, acoustic scene recognition, and many more.

  • Use audioDatastore to ingest large audio data sets and process files in parallel.

  • Use Signal Labeler to build audio data sets by annotating audio recordings manually and automatically.

  • Use audioDataAugmenter to create randomized pipelines of built-in or custom signal processing methods for augmenting and synthesizing audio data sets.

  • Use audioFeatureExtractor to extract combinations of different features while sharing intermediate computations.

Audio Toolbox also provides access to third-party APIs for text-to-speech and speech-to-text, and it includes pretrained VGGish and YAMNet models so that you can perform transfer learning, classify sounds, and extract feature embeddings. Using pretrained networks requires Deep Learning Toolbox™.