Main Content

Quantization and Pruning

Compress a deep neural network by performing quantization or pruning

Use Deep Learning Toolbox™ together with the Deep Learning Toolbox Model Quantization Library support package to reduce the memory footprint and computational requirements of a deep neural network by:

  • Quantizing the weights, biases, and activations of layers to reduced precision scaled integer data types. You can then generate C/C++, CUDA®, or HDL code from this quantized network.

  • Pruning filters from convolution layers by using first-order Taylor approximation. You can then generate C/C++ or CUDA code from this pruned network.


expand all

dlquantizerQuantize a deep neural network to 8-bit scaled integer data types
dlquantizationOptionsOptions for quantizing a trained deep neural network
calibrateSimulate and collect ranges of a deep neural network
validateQuantize and validate a deep neural network
quantizeCreate quantized deep neural network
estimateNetworkMetricsEstimate metrics for a specific layers of a neural network
quantizationDetailsDisplay the details for a quantized network
taylorPrunableNetworkNetwork that can be pruned by using first-order Taylor approximation
forwardCompute deep learning network output for training
predictCompute deep learning network output for inference
updatePrunablesRemove filters from prunable layers based on importance scores
updateScoreCompute and accumulate Taylor-based importance scores for pruning
dlnetworkDeep learning network for custom training loops


Deep Network QuantizerQuantize a deep neural network to 8-bit scaled integer data types


Deep Learning Quantization

Quantization for GPU Target

Quantization for FPGA Target

Quantization for CPU Target