Audio Toolbox™ provides the pretrained VGGish and YAMNet networks. Use the vggish and yamnet functions to interact directly with the pretrained networks. The classifySound function performs required preprocessing and postprocessing for YAMNet so that you can locate and classify sounds into one of 521 categories. You can explore the YAMNet ontology using the yamnetGraph function. The vggishFeatures function performs the necessary preprocessing and postprocessing for VGGish so that you can extract feature embeddings to input to machine learning and deep learning systems.

This functionality requires Deep Learning Toolbox™.


vggishFeaturesExtract VGGish features
vggishVGGish neural network
classifySoundClassify sounds in audio signal
yamnetYAMNet neural network
yamnetGraphGraph of YAMNet AudioSet ontology