Estimate Performance of Deep Learning Network
To reduce the time required to design a custom deep learning network that meets performance requirements, before deploying the network, analyze layer level latencies. Compare deep learning network performances on custom bitstream processor configurations to performances on reference (shipping) bitstream processor configurations.
To learn how to use the information in the table data from the
estimatePerformance
function to calculate your network performance, see
Profile Inference Run.
Estimate Performance of Custom Deep Learning Network for Custom Processor Configuration
This example shows how to calculate the performance of a deep learning network for a custom processor configuration.
Create a file in your current working folder called
getLogoNetwork.m
. In the file, enter:function net = getLogoNetwork() if ~isfile('LogoNet.mat') url = 'https://www.mathworks.com/supportfiles/gpucoder/cnn_models/logo_detection/LogoNet.mat'; websave('LogoNet.mat',url); end data = load('LogoNet.mat'); net = data.convnet; end
Call the function and save the result in
snet
.snet = getLogoNetwork;
Create a
dlhdl.ProcessorConfig
object.hPC = dlhdl.ProcessorConfig;
Call
estimatePerformance
withsnet
to retrieve the layer level latencies and performance for the LogoNet network.hPC.estimatePerformance(snet)
3 Memory Regions created. Deep Learning Processor Estimator Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 39853460 0.19927 1 39853460 5.0 conv_1 6825287 0.03413 maxpool_1 3755088 0.01878 conv_2 10440701 0.05220 maxpool_2 1447840 0.00724 conv_3 9393397 0.04697 maxpool_3 1765856 0.00883 conv_4 1770484 0.00885 maxpool_4 28098 0.00014 fc_1 2644884 0.01322 fc_2 1692532 0.00846 fc_3 89293 0.00045 * The clock frequency of the DL processor is: 200MHz
Evaluate Performance of Deep Learning Network on Custom Processor Configuration
Benchmark the performance of a deep learning network on a custom bitstream configuration by comparing it to the performance on a reference (shipping) bitstream configuration. Use the comparison results to adjust your custom deep learning processor parameters to achieve optimum performance.
In this example compare the performance of the ResNet-18 network on the zcu102_single
bitstream configuration to the performance on the default custom bitstream configuration.
Prerequisites
Deep Learning HDL Toolbox™ Support Package for Xilinx FPGA and SoC
Deep Learning Toolbox™
Deep Learning HDL Toolbox™
Deep Learning Toolbox Model for ResNet-18 Network
Load Pretrained Network
Load the pretrained network.
snet = resnet18;
Retrieve zcu102_single Bitstream Configuration
To retrieve the zcu102_single
bitstream configuration, use the dlhdl.ProcessorConfig
object. For more information, see dlhdl.ProcessorConfig
. To learn about modifiable parameters of the processor configuration, see getModuleProperty
and setModuleProperty
.
hPC_shipping = dlhdl.ProcessorConfig('Bitstream',"zcu102_single")
hPC_shipping = Processing Module "conv" ModuleGeneration: 'on' LRNBlockGeneration: 'on' ConvThreadNumber: 16 InputMemorySize: [227 227 3] OutputMemorySize: [227 227 3] FeatureSizeLimit: 2048 Processing Module "fc" ModuleGeneration: 'on' SoftmaxBlockGeneration: 'off' FCThreadNumber: 4 InputMemorySize: 25088 OutputMemorySize: 4096 Processing Module "adder" ModuleGeneration: 'on' InputMemorySize: 40 OutputMemorySize: 40 Processor Top Level Properties RunTimeControl: 'register' InputDataInterface: 'External Memory' OutputDataInterface: 'External Memory' ProcessorDataType: 'single' System Level Properties TargetPlatform: 'Xilinx Zynq UltraScale+ MPSoC ZCU102 Evaluation Kit' TargetFrequency: 220 SynthesisTool: 'Xilinx Vivado' ReferenceDesign: 'AXI-Stream DDR Memory Access : 3-AXIM' SynthesisToolChipFamily: 'Zynq UltraScale+' SynthesisToolDeviceName: 'xczu9eg-ffvb1156-2-e' SynthesisToolPackageName: '' SynthesisToolSpeedValue: ''
Estimate ResNet-18 Performance for zcu102_single
Bitstream Configuration
To estimate
the performance of the ResNet-18 DAG network, use the estimatePerformance
function of the dlhdl.ProcessorConfig
object. The function returns the estimated layer latency, network latency, and network performance in frames per second (Frames/s).
hPC_shipping.estimatePerformance(snet)
### Optimizing series network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer' ### Notice: The layer 'data' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software. ### Notice: The layer 'ClassificationLayer_predictions' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software. Deep Learning Processor Estimator Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 23634966 0.10743 1 23634966 9.3 ____conv1 2165372 0.00984 ____pool1 646226 0.00294 ____res2a_branch2a 966221 0.00439 ____res2a_branch2b 966221 0.00439 ____res2a 210750 0.00096 ____res2b_branch2a 966221 0.00439 ____res2b_branch2b 966221 0.00439 ____res2b 210750 0.00096 ____res3a_branch1 540749 0.00246 ____res3a_branch2a 763860 0.00347 ____res3a_branch2b 919117 0.00418 ____res3a 105404 0.00048 ____res3b_branch2a 919117 0.00418 ____res3b_branch2b 919117 0.00418 ____res3b 105404 0.00048 ____res4a_branch1 509261 0.00231 ____res4a_branch2a 509261 0.00231 ____res4a_branch2b 905421 0.00412 ____res4a 52724 0.00024 ____res4b_branch2a 905421 0.00412 ____res4b_branch2b 905421 0.00412 ____res4b 52724 0.00024 ____res5a_branch1 1046605 0.00476 ____res5a_branch2a 1046605 0.00476 ____res5a_branch2b 2005197 0.00911 ____res5a 26368 0.00012 ____res5b_branch2a 2005197 0.00911 ____res5b_branch2b 2005197 0.00911 ____res5b 26368 0.00012 ____pool5 54594 0.00025 ____fc1000 207852 0.00094 * The clock frequency of the DL processor is: 220MHz
Create Custom Processor Configuration
To create a custom processor configuration, use the dlhdl.ProcessorConfig
object. For more information, see dlhdl.ProcessorConfig
. To learn about modifiable parameters of the processor configuration, see getModuleProperty
and setModuleProperty
.
hPC_custom = dlhdl.ProcessorConfig
hPC_custom = Processing Module "conv" ModuleGeneration: 'on' LRNBlockGeneration: 'on' ConvThreadNumber: 16 InputMemorySize: [227 227 3] OutputMemorySize: [227 227 3] FeatureSizeLimit: 2048 Processing Module "fc" ModuleGeneration: 'on' SoftmaxBlockGeneration: 'off' FCThreadNumber: 4 InputMemorySize: 25088 OutputMemorySize: 4096 Processing Module "adder" ModuleGeneration: 'on' InputMemorySize: 40 OutputMemorySize: 40 Processor Top Level Properties RunTimeControl: 'register' InputDataInterface: 'External Memory' OutputDataInterface: 'External Memory' ProcessorDataType: 'single' System Level Properties TargetPlatform: 'Xilinx Zynq UltraScale+ MPSoC ZCU102 Evaluation Kit' TargetFrequency: 200 SynthesisTool: 'Xilinx Vivado' ReferenceDesign: 'AXI-Stream DDR Memory Access : 3-AXIM' SynthesisToolChipFamily: 'Zynq UltraScale+' SynthesisToolDeviceName: 'xczu9eg-ffvb1156-2-e' SynthesisToolPackageName: '' SynthesisToolSpeedValue: ''
Estimate ResNet-18 Performance for Custom Bitstream Configuration
To estimate
the performance of the ResNet-18 DAG network, use the estimatePerformance
function of the dlhdl.ProcessorConfig
object. The function returns the estimated layer latency, network latency, and network performance in frames per second (Frames/s).
hPC_custom.estimatePerformance(snet)
### Optimizing series network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer' ### Notice: The layer 'data' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software. ### Notice: The layer 'ClassificationLayer_predictions' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software. Deep Learning Processor Estimator Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 21219873 0.10610 1 21219873 9.4 ____conv1 2165372 0.01083 ____pool1 646226 0.00323 ____res2a_branch2a 966221 0.00483 ____res2a_branch2b 966221 0.00483 ____res2a 210750 0.00105 ____res2b_branch2a 966221 0.00483 ____res2b_branch2b 966221 0.00483 ____res2b 210750 0.00105 ____res3a_branch1 540749 0.00270 ____res3a_branch2a 708564 0.00354 ____res3a_branch2b 919117 0.00460 ____res3a 105404 0.00053 ____res3b_branch2a 919117 0.00460 ____res3b_branch2b 919117 0.00460 ____res3b 105404 0.00053 ____res4a_branch1 509261 0.00255 ____res4a_branch2a 509261 0.00255 ____res4a_branch2b 905421 0.00453 ____res4a 52724 0.00026 ____res4b_branch2a 905421 0.00453 ____res4b_branch2b 905421 0.00453 ____res4b 52724 0.00026 ____res5a_branch1 751693 0.00376 ____res5a_branch2a 751693 0.00376 ____res5a_branch2b 1415373 0.00708 ____res5a 26368 0.00013 ____res5b_branch2a 1415373 0.00708 ____res5b_branch2b 1415373 0.00708 ____res5b 26368 0.00013 ____pool5 54594 0.00027 ____fc1000 207351 0.00104 * The clock frequency of the DL processor is: 200MHz
The performance of the ResNet-18 network on the custom bitstream configuration is lower than the performance on the zcu102_single
bitstream configuration. The difference between the custom bitstream configuration and the zcu102_single
bitstream configuration is the target frequency.
Modify Custom Processor Configuration
Modify the custom processor configuration to increase the target frequency. To learn about modifiable parameters of the processor configuration, see dlhdl.ProcessorConfig
.
hPC_custom.TargetFrequency = 220; hPC_custom
hPC_custom = Processing Module "conv" ModuleGeneration: 'on' LRNBlockGeneration: 'on' ConvThreadNumber: 16 InputMemorySize: [227 227 3] OutputMemorySize: [227 227 3] FeatureSizeLimit: 2048 Processing Module "fc" ModuleGeneration: 'on' SoftmaxBlockGeneration: 'off' FCThreadNumber: 4 InputMemorySize: 25088 OutputMemorySize: 4096 Processing Module "adder" ModuleGeneration: 'on' InputMemorySize: 40 OutputMemorySize: 40 Processor Top Level Properties RunTimeControl: 'register' InputDataInterface: 'External Memory' OutputDataInterface: 'External Memory' ProcessorDataType: 'single' System Level Properties TargetPlatform: 'Xilinx Zynq UltraScale+ MPSoC ZCU102 Evaluation Kit' TargetFrequency: 220 SynthesisTool: 'Xilinx Vivado' ReferenceDesign: 'AXI-Stream DDR Memory Access : 3-AXIM' SynthesisToolChipFamily: 'Zynq UltraScale+' SynthesisToolDeviceName: 'xczu9eg-ffvb1156-2-e' SynthesisToolPackageName: '' SynthesisToolSpeedValue: ''
Re-estimate ResNet-18 Performance for Modified Custom Bitstream Configuration
Estimate the performance of the ResNet-18 DAG network on the modified custom bitstream configuration.
hPC_custom.estimatePerformance(snet)
### Optimizing series network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer' ### Notice: The layer 'data' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software. ### Notice: The layer 'ClassificationLayer_predictions' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software. Deep Learning Processor Estimator Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 23634966 0.10743 1 23634966 9.3 ____conv1 2165372 0.00984 ____pool1 646226 0.00294 ____res2a_branch2a 966221 0.00439 ____res2a_branch2b 966221 0.00439 ____res2a 210750 0.00096 ____res2b_branch2a 966221 0.00439 ____res2b_branch2b 966221 0.00439 ____res2b 210750 0.00096 ____res3a_branch1 540749 0.00246 ____res3a_branch2a 763860 0.00347 ____res3a_branch2b 919117 0.00418 ____res3a 105404 0.00048 ____res3b_branch2a 919117 0.00418 ____res3b_branch2b 919117 0.00418 ____res3b 105404 0.00048 ____res4a_branch1 509261 0.00231 ____res4a_branch2a 509261 0.00231 ____res4a_branch2b 905421 0.00412 ____res4a 52724 0.00024 ____res4b_branch2a 905421 0.00412 ____res4b_branch2b 905421 0.00412 ____res4b 52724 0.00024 ____res5a_branch1 1046605 0.00476 ____res5a_branch2a 1046605 0.00476 ____res5a_branch2b 2005197 0.00911 ____res5a 26368 0.00012 ____res5b_branch2a 2005197 0.00911 ____res5b_branch2b 2005197 0.00911 ____res5b 26368 0.00012 ____pool5 54594 0.00025 ____fc1000 207852 0.00094 * The clock frequency of the DL processor is: 220MHz
See Also
dlhdl.ProcessorConfig
| getModuleProperty
| setModuleProperty
| estimatePerformance
| estimateResources