Main Content

Quantization and Pruning

Compress a deep neural network by performing quantization or pruning

Use Deep Learning Toolbox™ together with the Deep Learning Toolbox Model Quantization Library support package to reduce the memory footprint and computational requirements of a deep neural network by:

  • Quantizing the weights, biases, and activations of layers to reduced precision scaled integer data types. You can then generate C/C++, CUDA®, or HDL code from this quantized network.

    For C/C++ and CUDA code generation, the software generates code for a convolutional deep neural network by quantizing the weights, biases, and activations of the convolution layers to 8-bit scaled integer data types. The quantization is performed by providing the calibration result file produced by the calibrate function to the codegen (MATLAB Coder) command.

    Code generation does not support quantized deep neural networks produced by the quantize function.

  • Pruning filters from convolution layers by using first-order Taylor approximation. You can then generate C/C++ or CUDA code from this pruned network.


expand all

dlquantizerQuantize a deep neural network to 8-bit scaled integer data types
dlquantizationOptionsOptions for quantizing a trained deep neural network
calibrateSimulate and collect ranges of a deep neural network
quantizeQuantize deep neural network
validateQuantize and validate a deep neural network
quantizationDetailsDisplay quantization details for a neural network
estimateNetworkMetricsEstimate network metrics for specific layers of a neural network
equalizeLayersEqualize layer parameters of deep neural network
taylorPrunableNetworkNetwork that can be pruned by using first-order Taylor approximation
forwardCompute deep learning network output for training
predictCompute deep learning network output for inference
updatePrunablesRemove filters from prunable layers based on importance scores
updateScoreCompute and accumulate Taylor-based importance scores for pruning
dlnetworkDeep learning network for custom training loops


Deep Network QuantizerQuantize a deep neural network to 8-bit scaled integer data types


Deep Learning Quantization

Quantization for GPU Target

Quantization for FPGA Target

Quantization for CPU Target