predict
Class: dlhdl.Workflow
Package: dlhdl
Predict responses by using deployed network
Syntax
Description
predicts responses for the image data, Y
= predict(workflowObject
,images
)images
, by using the deep
learning network specified in the dlhdl.Workflow
object,
workflowObject
.
predicts the responses for the data in the numeric or cell arrays Y
= predict(workflowObject
,X1,...,XN
)X1
,
…, XN
for the multi-input network specified in the
Network
argument of the workflowObject
. The
input XN
corresponds to the
workflowObject.Network.InputNames(N)
.
[
predicts responses for the Y1,...,YM
] = predict(___)M
outputs of a multi-output network using
any of the previous input arguments. The output YM
corresponds to the
output of the network specified in
workflowObject.Network.OutputNames(M)
.
[
predicts the responses with one or more arguments specified by optional name-value pair
arguments.Y
,performance
] = predict(___,Name,Value
)
Input Arguments
workflowObject
— Workflow
dlhdl.Workflow
object
Workflow, specified as a dlhdl.Workflow
object.
images
— Input image or input data
numeric array | formatted dlarray
object
Input image, specified as a numeric array, cell array or formatted
dlarray
object. The numeric arrays can be 3-D or 4-D arrays. For 4-D
arrays, the fourth dimension is the number of input images. If one of the members of the
numeric array has four dimensions, then the other members of the numeric arrays must
have four dimensions as well, with the value of the fourth dimension being the same for
all members.
If the network specified in the dlhdl.Workflow
object is a
dlnetwork
object, then the input image must be a formatted
dlarray
object. For more information about dlarray
formats, see the fmt
input
argument of dlarray
.
Data Types: single
| int8
X1,...,XN
— Numeric arrays for networks with multiple inputs
numeric array
Numeric or cell arrays for networks with multiple inputs, specified as a numeric array or cell array.
For multiple inputs to image prediction networks, the format of the predictors must
match the formats described in the images
argument
descriptions.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example:
Profile
— Flag that returns profiling results
"off'" (default) | "on"
Flag to return profiling results for the deep learning network deployed to the
target board, specified as "off"
or "on"
.
Example: Profile = "on"
Output Arguments
Y
— Predicted responses
numeric array
Predicted responses, returned as a numeric array. The format of
Y
depends on the type of task.
Task | Format |
---|---|
2-D image regression |
|
3-D image regression |
|
Sequence-to-one regression | N-by-R matrix, where N is the number of sequences and R is the number of responses |
Sequence-to-sequence regression | N-by-R matrix, where N is the number of sequences and R is the number of responses |
Feature regression | N-by-R matrix, where N is the number of observations and R is the number of responses |
For sequence-to-sequence regression problems with one observation,
images
can be a matrix. In this case, Y
is a
matrix of responses.
If the output layer of the network is a classification layer, then
Y
is the predicted classification scores. This table describes
the format of the scores for classification tasks.
Task | Format |
---|---|
Image classification | N-by-K matrix, where N is the number of observations and K is the number of classes |
Sequence-to-label classification | |
Feature classification |
Y1,...,YM
— Predicted scores or responses of networks with multiple outputs
numeric array
Predicted scores or responses of networks with multiple outputs, returned as numeric arrays.
Each output Yj
corresponds to the network output
net.OutputNames(j)
and has format as described in the
Y
output argument.
performance
— Deployed network performance data
table
Deployed network performance data, returned as an
N
-by-5 table, where
N
is the number of layers in the network.
This method returns performance only when the Profile
name-value
argument is set to 'on'
. To learn about the data in the performance
table, see Profile Inference Run.
Examples
Bicyclist and Pedestrian Classification by Using FPGA
This example shows how to deploy a custom trained series network to detect pedestrians and bicyclists based on their micro-Doppler signatures. This network is taken from the Pedestrian and Bicyclist Classification Using Deep Learning
example from the Phased Array Toolbox. For more details on network training and input data, see Pedestrian and Bicyclist Classification Using Deep Learning.
Prerequisites
Xilinx™ Vivado™ Design Suite 2020.2
Zynq® UltraScale+™ MPSoC ZCU102 Evaluation Kit
HDL Verifier™ Support Package for XIlinx FPGA Boards
MATLAB™ Coder ™ Interface for Deep Learning Libraries
Deep Learning Toolbox™
Deep Learning HDL Toolbox™
The data files used in this example are:
The MAT File
trainedNetBicPed.mat
contains a model trained on training data settrainDataNoCar
and its label settrainLabelNoCar
.The MAT File
testDataBicPed.mat
contains the test data settestDataNoCar
and its label settestLabelNoCar
.
Load Data and Network
Load a pretrained network. Load test data and its labels.
load('trainedNetBicPed.mat','trainedNetNoCar') load('testDataBicPed.mat')
View the layers of the pre-trained series network
analyzeNetwork(trainedNetNoCar);
Set up HDL Toolpath
Set up the path to your installed Xilinx™ Vivado™ Design Suite 2020.2 executable if it is not already set up. For example, to set the toolpath, enter:
% hdlsetuptoolpath('ToolName', 'Xilinx Vivado','ToolPath', 'C:\Vivado\2020.2\bin');
Create Target Object
Create a target object for your target device with a vendor name and an interface to connect your target device to the host computer. Interface options are JTAG (default) and Ethernet. Vendor options are Intel or Xilinx. Use the installed Xilinx Vivado Design Suite over an Ethernet connection to program the device.
hT = dlhdl.Target('Xilinx', 'Interface', 'Ethernet');
Create Workflow Object
Create an object of the dlhdl.Workflow
class. When you create the object, specify the network and the bitstream name. Specify the saved pre-trained series network, trainedNetNoCar,
as the network. Make sure the bitstream name matches the data type and the FPGA board that you are targeting. In this example, the target FPGA board is the Zynq UltraScale+ MPSoC ZCU102 board. The bitstream uses a single data type. .
hW = dlhdl.Workflow('Network', trainedNetNoCar, 'Bitstream', 'zcu102_single', 'Target', hT);
Compile trainedNetNoCar
Series Network
To compile the trainedNetNoCar
series network, run the compile function of the dlhdl.Workflo
w object .
dn = hW.compile;
### Optimizing series network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer' offset_name offset_address allocated_space _______________________ ______________ ________________ "InputDataOffset" "0x00000000" "28.0 MB" "OutputResultOffset" "0x01c00000" "4.0 MB" "SystemBufferOffset" "0x02000000" "28.0 MB" "InstructionDataOffset" "0x03c00000" "4.0 MB" "ConvWeightDataOffset" "0x04000000" "4.0 MB" "FCWeightDataOffset" "0x04400000" "4.0 MB" "EndOffset" "0x04800000" "Total: 72.0 MB"
Program the Bitstream onto FPGA and Download Network Weights
To deploy the network on the Zynq® UltraScale+™ MPSoC ZCU102 hardware, run the deploy function of the dlhdl.Workflow
object . This function uses the output of the compile function to program the FPGA board by using the programming file.The function also downloads the network weights and biases. The deploy function checks for the Xilinx Vivado tool and the supported tool version. It then starts programming the FPGA device by using the bitstream, displays progress messages and the time it takes to deploy the network.
hW.deploy;
### FPGA bitstream programming has been skipped as the same bitstream is already loaded on the target FPGA. ### Deep learning network programming has been skipped as the same network is already loaded on the target FPGA.
Run Predictions on Micro-Doppler Signatures
Classify one input from the sample test data set by using the predict function of the dlhdl.Workflow
object and display the label. The inputs to the network correspond to the sonograms of the micro-Doppler signatures for a pedestrian or a bicyclist or a combination of both.
testImg = single(testDataNoCar(:, :, :, 1)); testLabel = testLabelNoCar(1); classnames = trainedNetNoCar.Layers(end).Classes; % Get predictions from network on single test input score = hW.predict(testImg, 'Profile', 'On')
### Finished writing input activations. ### Running single input activations. Deep Learning Processor Profiler Performance Results LastLayerLatency(cycles) LastLayerLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 9430692 0.04287 1 9430707 23.3 conv_module 9411355 0.04278 conv_1 4178753 0.01899 maxpool_1 1394883 0.00634 conv_2 1975197 0.00898 maxpool_2 706156 0.00321 conv_3 813598 0.00370 maxpool_3 121790 0.00055 conv_4 148165 0.00067 maxpool_4 22255 0.00010 conv_5 41999 0.00019 avgpool2d 8674 0.00004 fc_module 19337 0.00009 fc 19337 0.00009 * The clock frequency of the DL processor is: 220MHz
score = 1×5 single row vector
0.9956 0.0000 0.0000 0.0044 0.0000
[~, idx1] = max(score); predTestLabel = classnames(idx1)
predTestLabel = categorical
ped
Load five random images from the sample test data set and execute the predict function of the dlhdl.Workflow
object to display the labels alongside the signatures. The predictions will happen at once since the input is concatenated along the fourth dimension.
numTestFrames = size(testDataNoCar, 4); numView = 5; listIndex = randperm(numTestFrames, numView); testImgBatch = single(testDataNoCar(:, :, :, listIndex)); testLabelBatch = testLabelNoCar(listIndex); % Get predictions from network using DL HDL Toolbox on FPGA [scores, speed] = hW.predict(testImgBatch, 'Profile', 'On');
### Finished writing input activations. ### Running single input activations. Deep Learning Processor Profiler Performance Results LastLayerLatency(cycles) LastLayerLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 9446929 0.04294 5 47138869 23.3 conv_module 9427488 0.04285 conv_1 4195175 0.01907 maxpool_1 1394705 0.00634 conv_2 1975204 0.00898 maxpool_2 706332 0.00321 conv_3 813499 0.00370 maxpool_3 121869 0.00055 conv_4 148063 0.00067 maxpool_4 22019 0.00010 conv_5 42053 0.00019 avgpool2d 8684 0.00004 fc_module 19441 0.00009 fc 19441 0.00009 * The clock frequency of the DL processor is: 220MHz
[~, idx2] = max(scores, [], 2); predTestLabelBatch = classnames(idx2); % Display the micro-doppler signatures along with the ground truth and % predictions. for k = 1:numView index = listIndex(k); imagesc(testDataNoCar(:, :, :, index)); axis xy xlabel('Time (s)') ylabel('Frequency (Hz)') title('Ground Truth: '+string(testLabelNoCar(index))+', Prediction FPGA: '+string(predTestLabelBatch(k))) drawnow; pause(3); end
The image shows the micro-Doppler signatures of two bicyclists (bic+bic) which is the ground truth. The ground truth is the classification of the image against which the network prediction is compared. The network prediction retrieved from the FPGA correctly predicts that the image has two bicyclists.
Classify Images on an FPGA Using a Quantized DAG Network
This example uses:
- Deep Learning HDL ToolboxDeep Learning HDL Toolbox
- Deep Learning HDL Toolbox Support Package for Xilinx FPGA and SoC DevicesDeep Learning HDL Toolbox Support Package for Xilinx FPGA and SoC Devices
- Deep Learning Toolbox Model Quantization LibraryDeep Learning Toolbox Model Quantization Library
- Deep Learning ToolboxDeep Learning Toolbox
- Computer Vision ToolboxComputer Vision Toolbox
- Deep Learning Toolbox Model for ResNet-18 NetworkDeep Learning Toolbox Model for ResNet-18 Network
In this example, you use Deep Learning HDL Toolbox™ to deploy a quantized deep convolutional neural network and classify an image. The example uses the pretrained ResNet-18 convolutional neural network to demonstrate transfer learning, quantization, and deployment for the quantized network. Use MATLAB ® to retrieve the prediction results.
ResNet-18 has been trained on over a million images and can classify images into 1000 object categories (such as keyboard, coffee mug, pencil, and many animals). The network has learned rich feature representations for a wide range of images. The network takes an image as input and outputs a label for the object in the image together with the probabilities for each of the object categories.
Required Products
For this example, you need:
Deep Learning Toolbox™
Deep Learning HDL Toolbox™
Deep Learning Toolbox Model for ResNet-18 Network
Deep Learning HDL Toolbox™ Support Package for Xilinx® FPGA and SoC Devices
Image Processing Toolbox™
Deep Learning Toolbox Model Quantization Library
MATLAB® Coder™ Interface for Deep Learning Libraries
Transfer Learning Using Resnet-18
To perform classification on a new set of images, you fine-tune a pretrained ResNet-18 convolutional neural network by transfer learning. In transfer learning, you can take a pretrained network and use it as a starting point to learn a new task. Fine-tuning a network with transfer learning is usually much faster and easier than training a network with randomly initialized weights from scratch. You can quickly transfer learned features to a new task using a smaller number of training images.
Load Pretrained ResNet-18 Network
To load the pretrained network ResNet-18, enter:
net = resnet18;
To view the layers of the pretrained network, enter:
analyzeNetwork(net);
The first layer, the image input layer, requires input images of size 227-by-227-by-3, where 3 is the number of color channels.
inputSize = net.Layers(1).InputSize;
Define Training and Validation Data Sets
This example uses the MathWorks
MerchData data set. This is a small data set containing 75 images of MathWorks merchandise, belonging to five different classes (cap, cube, playing cards, screwdriver, and torch).
curDir = pwd; unzip('MerchData.zip'); imds = imageDatastore('MerchData', ... 'IncludeSubfolders',true, ... 'LabelSource','foldernames'); [imdsTrain,imdsValidation] = splitEachLabel(imds,0.7,'randomized'); validationData_FPGA = imdsValidation.subset(1:1);
Replace Final Layers
The fully connected layer and classification layer of the pretrained network net
are configured for 1000 classes. These two layers fc1000
and ClassificationLayer_predictions
in ResNet-18, contain information on how to combine the features that the network extracts into class probabilities and predicted labels . These two layers must be fine-tuned for the new classification problem. Extract all the layers, except the last two layers, from the pretrained network.
lgraph = layerGraph(net)
lgraph = LayerGraph with properties: Layers: [71×1 nnet.cnn.layer.Layer] Connections: [78×2 table] InputNames: {'data'} OutputNames: {'ClassificationLayer_predictions'}
numClasses = numel(categories(imdsTrain.Labels))
numClasses = 5
newLearnableLayer = fullyConnectedLayer(numClasses, ... 'Name','new_fc', ... 'WeightLearnRateFactor',10, ... 'BiasLearnRateFactor',10); lgraph = replaceLayer(lgraph,'fc1000',newLearnableLayer); newClassLayer = classificationLayer('Name','new_classoutput'); lgraph = replaceLayer(lgraph,'ClassificationLayer_predictions',newClassLayer);
Train Network
The network requires input images of size 224-by-224-by-3, but the images in the image datastores have different sizes. Use an augmented image datastore to automatically resize the training images. Specify additional augmentation operations to perform on the training images, such as randomly flipping the training images along the vertical axis and randomly translating them up to 30 pixels horizontally and vertically. Data augmentation helps prevent the network from overfitting and memorizing the exact details of the training images.
pixelRange = [-30 30]; imageAugmenter = imageDataAugmenter( ... 'RandXReflection',true, ... 'RandXTranslation',pixelRange, ... 'RandYTranslation',pixelRange);
To automatically resize the validation images without performing further data augmentation, use an augmented image datastore without specifying any additional preprocessing operations.
augimdsTrain = augmentedImageDatastore(inputSize(1:2),imdsTrain, ... 'DataAugmentation',imageAugmenter); augimdsValidation = augmentedImageDatastore(inputSize(1:2),imdsValidation);
Specify the training options. For transfer learning, keep the features from the early layers of the pretrained network (the transferred layer weights). To slow down learning in the transferred layers, set the initial learning rate to a small value. Specify the mini-batch size and validation data. The software validates the network every ValidationFrequency
iterations during training.
options = trainingOptions('sgdm', ... 'MiniBatchSize',10, ... 'MaxEpochs',6, ... 'InitialLearnRate',1e-4, ... 'Shuffle','every-epoch', ... 'ValidationData',augimdsValidation, ... 'ValidationFrequency',3, ... 'Verbose',false, ... 'Plots','training-progress');
Train the network that consists of the transferred and new layers. By default, trainNetwork
uses a GPU if one is available (requires Parallel Computing Toolbox™ and a supported GPU device. For more information, see GPU Computing Requirements (Parallel Computing Toolbox)). Otherwise, the network uses a CPU (requires MATLAB Coder Interface for Deep learning Libraries™). You can also specify the execution environment by using the 'ExecutionEnvironment'
name-value argument of trainingOptions
.
netTransfer = trainNetwork(augimdsTrain,lgraph,options);
Quantize the Network
Create a dlquantizer
object and specify the network to quantize.
dlquantObj = dlquantizer(netTransfer,'ExecutionEnvironment','FPGA');
Calibrate the Quantized Network Object
Use the calibrate
function to exercise the network with sample inputs and collect the range information. The calibrate
function exercises the network and collects the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. The calibrate
function returns a table. Each row of the table contains range information for a learnable parameter of the quantized network.
dlquantObj.calibrate(augimdsTrain)
ans=95×5 table
Optimized Layer Name Network Layer Name Learnables / Activations MinValue MaxValue
__________________________ __________________ ________________________ ________ ________
{'conv1_Weights' } {'conv1' } "Weights" -0.56885 0.65166
{'conv1_Bias' } {'conv1' } "Bias" -0.66869 0.67504
{'res2a_branch2a_Weights'} {'res2a_branch2a'} "Weights" -0.46037 0.34327
{'res2a_branch2a_Bias' } {'res2a_branch2a'} "Bias" -0.82446 1.3337
{'res2a_branch2b_Weights'} {'res2a_branch2b'} "Weights" -0.8002 0.60524
{'res2a_branch2b_Bias' } {'res2a_branch2b'} "Bias" -1.3954 1.7536
{'res2b_branch2a_Weights'} {'res2b_branch2a'} "Weights" -0.33991 0.3503
{'res2b_branch2a_Bias' } {'res2b_branch2a'} "Bias" -1.1367 1.5317
{'res2b_branch2b_Weights'} {'res2b_branch2b'} "Weights" -1.2616 0.93491
{'res2b_branch2b_Bias' } {'res2b_branch2b'} "Bias" -0.86662 1.2352
{'res3a_branch2a_Weights'} {'res3a_branch2a'} "Weights" -0.19675 0.23903
{'res3a_branch2a_Bias' } {'res3a_branch2a'} "Bias" -0.5063 0.69182
{'res3a_branch2b_Weights'} {'res3a_branch2b'} "Weights" -0.5385 0.74078
{'res3a_branch2b_Bias' } {'res3a_branch2b'} "Bias" -0.66884 1.2152
{'res3a_branch1_Weights' } {'res3a_branch1' } "Weights" -0.66715 0.98369
{'res3a_branch1_Bias' } {'res3a_branch1' } "Bias" -0.97269 0.83073
⋮
Create Target Object
Use the dlhdl.Target
class to create a target object with a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet. To use JTAG,Install Xilinx™ Vivado™ Design Suite 2020.2. To set the Xilinx Vivado toolpath, enter:
% hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2020.2\bin\vivado.bat');
hTarget = dlhdl.Target('Xilinx','Interface','Ethernet');
Create WorkFlow Object
Use the dlhdl.Workflow
class to create an object. When you create the object, specify the network and the bitstream name. Specify the saved pretrained alexnet neural network as the network. Make sure that the bitstream name matches the data type and the FPGA board that you are targeting. In this example, the target FPGA board is the Xilinx ZCU102 SoC board. The bitstream uses a single data type.
hW = dlhdl.Workflow('Network', dlquantObj, 'Bitstream', 'zcu102_int8','Target',hTarget);
Compile the netTransfer DAG network
To compile the netTransfer DAG network, run the compile method of the dlhdl.Workflow
object. You can optionally specify the maximum number of input frames.
dn = hW.compile('InputFrameNumberLimit',15)
### Compiling network for Deep Learning FPGA prototyping ... ### Targeting FPGA bitstream zcu102_int8. ### The network includes the following layers: 1 'data' Image Input 224×224×3 images with 'zscore' normalization (SW Layer) 2 'conv1' Convolution 64 7×7×3 convolutions with stride [2 2] and padding [3 3 3 3] (HW Layer) 3 'bn_conv1' Batch Normalization Batch normalization with 64 channels (HW Layer) 4 'conv1_relu' ReLU ReLU (HW Layer) 5 'pool1' Max Pooling 3×3 max pooling with stride [2 2] and padding [1 1 1 1] (HW Layer) 6 'res2a_branch2a' Convolution 64 3×3×64 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 7 'bn2a_branch2a' Batch Normalization Batch normalization with 64 channels (HW Layer) 8 'res2a_branch2a_relu' ReLU ReLU (HW Layer) 9 'res2a_branch2b' Convolution 64 3×3×64 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 10 'bn2a_branch2b' Batch Normalization Batch normalization with 64 channels (HW Layer) 11 'res2a' Addition Element-wise addition of 2 inputs (HW Layer) 12 'res2a_relu' ReLU ReLU (HW Layer) 13 'res2b_branch2a' Convolution 64 3×3×64 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 14 'bn2b_branch2a' Batch Normalization Batch normalization with 64 channels (HW Layer) 15 'res2b_branch2a_relu' ReLU ReLU (HW Layer) 16 'res2b_branch2b' Convolution 64 3×3×64 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 17 'bn2b_branch2b' Batch Normalization Batch normalization with 64 channels (HW Layer) 18 'res2b' Addition Element-wise addition of 2 inputs (HW Layer) 19 'res2b_relu' ReLU ReLU (HW Layer) 20 'res3a_branch2a' Convolution 128 3×3×64 convolutions with stride [2 2] and padding [1 1 1 1] (HW Layer) 21 'bn3a_branch2a' Batch Normalization Batch normalization with 128 channels (HW Layer) 22 'res3a_branch2a_relu' ReLU ReLU (HW Layer) 23 'res3a_branch2b' Convolution 128 3×3×128 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 24 'bn3a_branch2b' Batch Normalization Batch normalization with 128 channels (HW Layer) 25 'res3a' Addition Element-wise addition of 2 inputs (HW Layer) 26 'res3a_relu' ReLU ReLU (HW Layer) 27 'res3a_branch1' Convolution 128 1×1×64 convolutions with stride [2 2] and padding [0 0 0 0] (HW Layer) 28 'bn3a_branch1' Batch Normalization Batch normalization with 128 channels (HW Layer) 29 'res3b_branch2a' Convolution 128 3×3×128 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 30 'bn3b_branch2a' Batch Normalization Batch normalization with 128 channels (HW Layer) 31 'res3b_branch2a_relu' ReLU ReLU (HW Layer) 32 'res3b_branch2b' Convolution 128 3×3×128 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 33 'bn3b_branch2b' Batch Normalization Batch normalization with 128 channels (HW Layer) 34 'res3b' Addition Element-wise addition of 2 inputs (HW Layer) 35 'res3b_relu' ReLU ReLU (HW Layer) 36 'res4a_branch2a' Convolution 256 3×3×128 convolutions with stride [2 2] and padding [1 1 1 1] (HW Layer) 37 'bn4a_branch2a' Batch Normalization Batch normalization with 256 channels (HW Layer) 38 'res4a_branch2a_relu' ReLU ReLU (HW Layer) 39 'res4a_branch2b' Convolution 256 3×3×256 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 40 'bn4a_branch2b' Batch Normalization Batch normalization with 256 channels (HW Layer) 41 'res4a' Addition Element-wise addition of 2 inputs (HW Layer) 42 'res4a_relu' ReLU ReLU (HW Layer) 43 'res4a_branch1' Convolution 256 1×1×128 convolutions with stride [2 2] and padding [0 0 0 0] (HW Layer) 44 'bn4a_branch1' Batch Normalization Batch normalization with 256 channels (HW Layer) 45 'res4b_branch2a' Convolution 256 3×3×256 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 46 'bn4b_branch2a' Batch Normalization Batch normalization with 256 channels (HW Layer) 47 'res4b_branch2a_relu' ReLU ReLU (HW Layer) 48 'res4b_branch2b' Convolution 256 3×3×256 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 49 'bn4b_branch2b' Batch Normalization Batch normalization with 256 channels (HW Layer) 50 'res4b' Addition Element-wise addition of 2 inputs (HW Layer) 51 'res4b_relu' ReLU ReLU (HW Layer) 52 'res5a_branch2a' Convolution 512 3×3×256 convolutions with stride [2 2] and padding [1 1 1 1] (HW Layer) 53 'bn5a_branch2a' Batch Normalization Batch normalization with 512 channels (HW Layer) 54 'res5a_branch2a_relu' ReLU ReLU (HW Layer) 55 'res5a_branch2b' Convolution 512 3×3×512 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 56 'bn5a_branch2b' Batch Normalization Batch normalization with 512 channels (HW Layer) 57 'res5a' Addition Element-wise addition of 2 inputs (HW Layer) 58 'res5a_relu' ReLU ReLU (HW Layer) 59 'res5a_branch1' Convolution 512 1×1×256 convolutions with stride [2 2] and padding [0 0 0 0] (HW Layer) 60 'bn5a_branch1' Batch Normalization Batch normalization with 512 channels (HW Layer) 61 'res5b_branch2a' Convolution 512 3×3×512 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 62 'bn5b_branch2a' Batch Normalization Batch normalization with 512 channels (HW Layer) 63 'res5b_branch2a_relu' ReLU ReLU (HW Layer) 64 'res5b_branch2b' Convolution 512 3×3×512 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 65 'bn5b_branch2b' Batch Normalization Batch normalization with 512 channels (HW Layer) 66 'res5b' Addition Element-wise addition of 2 inputs (HW Layer) 67 'res5b_relu' ReLU ReLU (HW Layer) 68 'pool5' 2-D Global Average Pooling 2-D global average pooling (HW Layer) 69 'new_fc' Fully Connected 5 fully connected layer (HW Layer) 70 'prob' Softmax softmax (HW Layer) 71 'new_classoutput' Classification Output crossentropyex with 'MathWorks Cap' and 4 other classes (SW Layer) ### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer' ### Notice: The layer 'data' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software. ### Notice: The layer 'prob' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software. ### Notice: The layer 'new_classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software. ### Compiling layer group: conv1>>pool1 ... ### Compiling layer group: conv1>>pool1 ... complete. ### Compiling layer group: res2a_branch2a>>res2a_branch2b ... ### Compiling layer group: res2a_branch2a>>res2a_branch2b ... complete. ### Compiling layer group: res2b_branch2a>>res2b_branch2b ... ### Compiling layer group: res2b_branch2a>>res2b_branch2b ... complete. ### Compiling layer group: res3a_branch1 ... ### Compiling layer group: res3a_branch1 ... complete. ### Compiling layer group: res3a_branch2a>>res3a_branch2b ... ### Compiling layer group: res3a_branch2a>>res3a_branch2b ... complete. ### Compiling layer group: res3b_branch2a>>res3b_branch2b ... ### Compiling layer group: res3b_branch2a>>res3b_branch2b ... complete. ### Compiling layer group: res4a_branch1 ... ### Compiling layer group: res4a_branch1 ... complete. ### Compiling layer group: res4a_branch2a>>res4a_branch2b ... ### Compiling layer group: res4a_branch2a>>res4a_branch2b ... complete. ### Compiling layer group: res4b_branch2a>>res4b_branch2b ... ### Compiling layer group: res4b_branch2a>>res4b_branch2b ... complete. ### Compiling layer group: res5a_branch1 ... ### Compiling layer group: res5a_branch1 ... complete. ### Compiling layer group: res5a_branch2a>>res5a_branch2b ... ### Compiling layer group: res5a_branch2a>>res5a_branch2b ... complete. ### Compiling layer group: res5b_branch2a>>res5b_branch2b ... ### Compiling layer group: res5b_branch2a>>res5b_branch2b ... complete. ### Compiling layer group: pool5 ... ### Compiling layer group: pool5 ... complete. ### Compiling layer group: new_fc ... ### Compiling layer group: new_fc ... complete. ### Allocating external memory buffers: offset_name offset_address allocated_space _______________________ ______________ ________________ "InputDataOffset" "0x00000000" "8.0 MB" "OutputResultOffset" "0x00800000" "4.0 MB" "SchedulerDataOffset" "0x00c00000" "4.0 MB" "SystemBufferOffset" "0x01000000" "28.0 MB" "InstructionDataOffset" "0x02c00000" "4.0 MB" "ConvWeightDataOffset" "0x03000000" "16.0 MB" "FCWeightDataOffset" "0x04000000" "4.0 MB" "EndOffset" "0x04400000" "Total: 68.0 MB" ### Network compilation complete.
dn = struct with fields:
weights: [1×1 struct]
instructions: [1×1 struct]
registers: [1×1 struct]
syncInstructions: [1×1 struct]
constantData: {}
Program Bitstream onto FPGA and Download Network Weights
To deploy the network on the Xilinx ZCU102 hardware, run the deploy function of the dlhdl.Workflow
object. This function uses the output of the compile function to program the FPGA board by using the programming file. It also downloads the network weights and biases. The deploy function starts programming the FPGA device, displays progress messages, and the time it takes to deploy the network.
hW.deploy
### Attempting to connect to the hardware board at 192.168.1.101... ### Connection successful ### Programming the FPGA bitstream has been completed successfully. ### Loading weights to Conv Processor. ### Conv Weights loaded. Current time is 20-Jan-2022 08:45:22 ### Loading weights to FC Processor. ### FC Weights loaded. Current time is 20-Jan-2022 08:45:22
Load Image for Prediction
Load the example image.
imgFile = fullfile(pwd,'MerchData','MathWorks Cube','Mathworks cube_0.jpg'); inputImg = imresize(imread(imgFile),[224 224]); imshow(inputImg)
Run Prediction for One Image
Execute the predict method on the dlhdl.Workflow
object and then show the label in the MATLAB command window.
[prediction, speed] = hW.predict(single(inputImg),'Profile','on');
### Finished writing input activations. ### Running single input activation. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 7389695 0.02956 1 7392277 33.8 conv1 1115359 0.00446 pool1 237742 0.00095 res2a_branch2a 269669 0.00108 res2a_branch2b 270019 0.00108 res2a 103095 0.00041 res2b_branch2a 269716 0.00108 res2b_branch2b 269895 0.00108 res2b 102385 0.00041 res3a_branch1 156246 0.00062 res3a_branch2a 227373 0.00091 res3a_branch2b 245201 0.00098 res3a 52543 0.00021 res3b_branch2a 244793 0.00098 res3b_branch2b 244952 0.00098 res3b 51176 0.00020 res4a_branch1 135788 0.00054 res4a_branch2a 135745 0.00054 res4a_branch2b 237464 0.00095 res4a 25612 0.00010 res4b_branch2a 237244 0.00095 res4b_branch2b 237242 0.00095 res4b 25952 0.00010 res5a_branch1 311610 0.00125 res5a_branch2a 311719 0.00125 res5a_branch2b 596194 0.00238 res5a 13191 0.00005 res5b_branch2a 595890 0.00238 res5b_branch2b 596795 0.00239 res5b 14141 0.00006 pool5 36932 0.00015 new_fc 17825 0.00007 * The clock frequency of the DL processor is: 250MHz
[val, idx] = max(prediction); dlquantObj.NetworkObject.Layers(end).ClassNames{idx}
ans = 'MathWorks Cube'
Performance Comparison
Compare the performance of the quantized network to the single data type network.
options_FPGA = dlquantizationOptions('Bitstream','zcu102_int8','Target',hTarget); prediction_FPGA = dlquantObj.validate(imdsValidation,options_FPGA)
### Compiling network for Deep Learning FPGA prototyping ... ### Targeting FPGA bitstream zcu102_int8. ### The network includes the following layers: 1 'data' Image Input 224×224×3 images with 'zscore' normalization (SW Layer) 2 'conv1' Convolution 64 7×7×3 convolutions with stride [2 2] and padding [3 3 3 3] (HW Layer) 3 'bn_conv1' Batch Normalization Batch normalization with 64 channels (HW Layer) 4 'conv1_relu' ReLU ReLU (HW Layer) 5 'pool1' Max Pooling 3×3 max pooling with stride [2 2] and padding [1 1 1 1] (HW Layer) 6 'res2a_branch2a' Convolution 64 3×3×64 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 7 'bn2a_branch2a' Batch Normalization Batch normalization with 64 channels (HW Layer) 8 'res2a_branch2a_relu' ReLU ReLU (HW Layer) 9 'res2a_branch2b' Convolution 64 3×3×64 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 10 'bn2a_branch2b' Batch Normalization Batch normalization with 64 channels (HW Layer) 11 'res2a' Addition Element-wise addition of 2 inputs (HW Layer) 12 'res2a_relu' ReLU ReLU (HW Layer) 13 'res2b_branch2a' Convolution 64 3×3×64 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 14 'bn2b_branch2a' Batch Normalization Batch normalization with 64 channels (HW Layer) 15 'res2b_branch2a_relu' ReLU ReLU (HW Layer) 16 'res2b_branch2b' Convolution 64 3×3×64 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 17 'bn2b_branch2b' Batch Normalization Batch normalization with 64 channels (HW Layer) 18 'res2b' Addition Element-wise addition of 2 inputs (HW Layer) 19 'res2b_relu' ReLU ReLU (HW Layer) 20 'res3a_branch2a' Convolution 128 3×3×64 convolutions with stride [2 2] and padding [1 1 1 1] (HW Layer) 21 'bn3a_branch2a' Batch Normalization Batch normalization with 128 channels (HW Layer) 22 'res3a_branch2a_relu' ReLU ReLU (HW Layer) 23 'res3a_branch2b' Convolution 128 3×3×128 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 24 'bn3a_branch2b' Batch Normalization Batch normalization with 128 channels (HW Layer) 25 'res3a' Addition Element-wise addition of 2 inputs (HW Layer) 26 'res3a_relu' ReLU ReLU (HW Layer) 27 'res3a_branch1' Convolution 128 1×1×64 convolutions with stride [2 2] and padding [0 0 0 0] (HW Layer) 28 'bn3a_branch1' Batch Normalization Batch normalization with 128 channels (HW Layer) 29 'res3b_branch2a' Convolution 128 3×3×128 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 30 'bn3b_branch2a' Batch Normalization Batch normalization with 128 channels (HW Layer) 31 'res3b_branch2a_relu' ReLU ReLU (HW Layer) 32 'res3b_branch2b' Convolution 128 3×3×128 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 33 'bn3b_branch2b' Batch Normalization Batch normalization with 128 channels (HW Layer) 34 'res3b' Addition Element-wise addition of 2 inputs (HW Layer) 35 'res3b_relu' ReLU ReLU (HW Layer) 36 'res4a_branch2a' Convolution 256 3×3×128 convolutions with stride [2 2] and padding [1 1 1 1] (HW Layer) 37 'bn4a_branch2a' Batch Normalization Batch normalization with 256 channels (HW Layer) 38 'res4a_branch2a_relu' ReLU ReLU (HW Layer) 39 'res4a_branch2b' Convolution 256 3×3×256 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 40 'bn4a_branch2b' Batch Normalization Batch normalization with 256 channels (HW Layer) 41 'res4a' Addition Element-wise addition of 2 inputs (HW Layer) 42 'res4a_relu' ReLU ReLU (HW Layer) 43 'res4a_branch1' Convolution 256 1×1×128 convolutions with stride [2 2] and padding [0 0 0 0] (HW Layer) 44 'bn4a_branch1' Batch Normalization Batch normalization with 256 channels (HW Layer) 45 'res4b_branch2a' Convolution 256 3×3×256 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 46 'bn4b_branch2a' Batch Normalization Batch normalization with 256 channels (HW Layer) 47 'res4b_branch2a_relu' ReLU ReLU (HW Layer) 48 'res4b_branch2b' Convolution 256 3×3×256 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 49 'bn4b_branch2b' Batch Normalization Batch normalization with 256 channels (HW Layer) 50 'res4b' Addition Element-wise addition of 2 inputs (HW Layer) 51 'res4b_relu' ReLU ReLU (HW Layer) 52 'res5a_branch2a' Convolution 512 3×3×256 convolutions with stride [2 2] and padding [1 1 1 1] (HW Layer) 53 'bn5a_branch2a' Batch Normalization Batch normalization with 512 channels (HW Layer) 54 'res5a_branch2a_relu' ReLU ReLU (HW Layer) 55 'res5a_branch2b' Convolution 512 3×3×512 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 56 'bn5a_branch2b' Batch Normalization Batch normalization with 512 channels (HW Layer) 57 'res5a' Addition Element-wise addition of 2 inputs (HW Layer) 58 'res5a_relu' ReLU ReLU (HW Layer) 59 'res5a_branch1' Convolution 512 1×1×256 convolutions with stride [2 2] and padding [0 0 0 0] (HW Layer) 60 'bn5a_branch1' Batch Normalization Batch normalization with 512 channels (HW Layer) 61 'res5b_branch2a' Convolution 512 3×3×512 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 62 'bn5b_branch2a' Batch Normalization Batch normalization with 512 channels (HW Layer) 63 'res5b_branch2a_relu' ReLU ReLU (HW Layer) 64 'res5b_branch2b' Convolution 512 3×3×512 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 65 'bn5b_branch2b' Batch Normalization Batch normalization with 512 channels (HW Layer) 66 'res5b' Addition Element-wise addition of 2 inputs (HW Layer) 67 'res5b_relu' ReLU ReLU (HW Layer) 68 'pool5' 2-D Global Average Pooling 2-D global average pooling (HW Layer) 69 'new_fc' Fully Connected 5 fully connected layer (HW Layer) 70 'prob' Softmax softmax (HW Layer) 71 'new_classoutput' Classification Output crossentropyex with 'MathWorks Cap' and 4 other classes (SW Layer) ### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer' ### Notice: The layer 'data' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software. ### Notice: The layer 'prob' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software. ### Notice: The layer 'new_classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software. ### Compiling layer group: conv1>>pool1 ... ### Compiling layer group: conv1>>pool1 ... complete. ### Compiling layer group: res2a_branch2a>>res2a_branch2b ... ### Compiling layer group: res2a_branch2a>>res2a_branch2b ... complete. ### Compiling layer group: res2b_branch2a>>res2b_branch2b ... ### Compiling layer group: res2b_branch2a>>res2b_branch2b ... complete. ### Compiling layer group: res3a_branch1 ... ### Compiling layer group: res3a_branch1 ... complete. ### Compiling layer group: res3a_branch2a>>res3a_branch2b ... ### Compiling layer group: res3a_branch2a>>res3a_branch2b ... complete. ### Compiling layer group: res3b_branch2a>>res3b_branch2b ... ### Compiling layer group: res3b_branch2a>>res3b_branch2b ... complete. ### Compiling layer group: res4a_branch1 ... ### Compiling layer group: res4a_branch1 ... complete. ### Compiling layer group: res4a_branch2a>>res4a_branch2b ... ### Compiling layer group: res4a_branch2a>>res4a_branch2b ... complete. ### Compiling layer group: res4b_branch2a>>res4b_branch2b ... ### Compiling layer group: res4b_branch2a>>res4b_branch2b ... complete. ### Compiling layer group: res5a_branch1 ... ### Compiling layer group: res5a_branch1 ... complete. ### Compiling layer group: res5a_branch2a>>res5a_branch2b ... ### Compiling layer group: res5a_branch2a>>res5a_branch2b ... complete. ### Compiling layer group: res5b_branch2a>>res5b_branch2b ... ### Compiling layer group: res5b_branch2a>>res5b_branch2b ... complete. ### Compiling layer group: pool5 ... ### Compiling layer group: pool5 ... complete. ### Compiling layer group: new_fc ... ### Compiling layer group: new_fc ... complete. ### Allocating external memory buffers: offset_name offset_address allocated_space _______________________ ______________ ________________ "InputDataOffset" "0x00000000" "12.0 MB" "OutputResultOffset" "0x00c00000" "4.0 MB" "SchedulerDataOffset" "0x01000000" "4.0 MB" "SystemBufferOffset" "0x01400000" "28.0 MB" "InstructionDataOffset" "0x03000000" "4.0 MB" "ConvWeightDataOffset" "0x03400000" "16.0 MB" "FCWeightDataOffset" "0x04400000" "4.0 MB" "EndOffset" "0x04800000" "Total: 72.0 MB" ### Network compilation complete. ### FPGA bitstream programming has been skipped as the same bitstream is already loaded on the target FPGA. ### Loading weights to Conv Processor. ### Conv Weights loaded. Current time is 20-Jan-2022 08:46:40 ### Loading weights to FC Processor. ### FC Weights loaded. Current time is 20-Jan-2022 08:46:40 ### Finished writing input activations. ### Running in multi-frame mode with 20 inputs. Deep Learning Processor Bitstream Build Info Resource Utilized Total Percentage ------------------ ---------- ------------ ------------ LUTs (CLB/ALM)* 248190 274080 90.55 DSPs 384 2520 15.24 Block RAM 581 912 63.71 * LUT count represents Configurable Logic Block(CLB) utilization in Xilinx devices and Adaptive Logic Module (ALM) utilization in Intel devices. ### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer' ### Notice: The layer 'data' of type 'ImageInputLayer' is split into an image input layer 'data', an addition layer 'data_norm_add', and a multiplication layer 'data_norm' for hardware normalization. ### Notice: The layer 'prob' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software. ### Notice: The layer 'new_classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software. Deep Learning Processor Estimator Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 23871185 0.10851 1 23871185 9.2 ____data_norm_add 210750 0.00096 ____data_norm 210750 0.00096 ____conv1 2165372 0.00984 ____pool1 646226 0.00294 ____res2a_branch2a 966221 0.00439 ____res2a_branch2b 966221 0.00439 ____res2a 210750 0.00096 ____res2b_branch2a 966221 0.00439 ____res2b_branch2b 966221 0.00439 ____res2b 210750 0.00096 ____res3a_branch1 540749 0.00246 ____res3a_branch2a 763860 0.00347 ____res3a_branch2b 919117 0.00418 ____res3a 105404 0.00048 ____res3b_branch2a 919117 0.00418 ____res3b_branch2b 919117 0.00418 ____res3b 105404 0.00048 ____res4a_branch1 509261 0.00231 ____res4a_branch2a 509261 0.00231 ____res4a_branch2b 905421 0.00412 ____res4a 52724 0.00024 ____res4b_branch2a 905421 0.00412 ____res4b_branch2b 905421 0.00412 ____res4b 52724 0.00024 ____res5a_branch1 1046605 0.00476 ____res5a_branch2a 1046605 0.00476 ____res5a_branch2b 2005197 0.00911 ____res5a 26368 0.00012 ____res5b_branch2a 2005197 0.00911 ____res5b_branch2b 2005197 0.00911 ____res5b 26368 0.00012 ____pool5 54594 0.00025 ____new_fc 22571 0.00010 * The clock frequency of the DL processor is: 220MHz Deep Learning Processor Bitstream Build Info Resource Utilized Total Percentage ------------------ ---------- ------------ ------------ LUTs (CLB/ALM)* 168836 274080 61.60 DSPs 800 2520 31.75 Block RAM 453 912 49.67 * LUT count represents Configurable Logic Block(CLB) utilization in Xilinx devices and Adaptive Logic Module (ALM) utilization in Intel devices. ### Finished writing input activations. ### Running single input activation.
prediction_FPGA = struct with fields:
NumSamples: 20
MetricResults: [1×1 struct]
Statistics: [2×7 table]
prediction_FPGA.Statistics.FramesPerSecond
ans = 2×1
9.2161
33.8157
The first number is the frames per second performance for the single data type network and the second number is the frames per second performance for the quantized network.
Detect Objects Using YOLO v3 Network Deployed to FPGA
This example uses:
- Deep Learning HDL ToolboxDeep Learning HDL Toolbox
- Deep Learning ToolboxDeep Learning Toolbox
- Deep Learning HDL Toolbox Support Package for Xilinx FPGA and SoC DevicesDeep Learning HDL Toolbox Support Package for Xilinx FPGA and SoC Devices
- Computer Vision ToolboxComputer Vision Toolbox
This example shows how to deploy a trained you only look once (YOLO) v3 object detector to a target FPGA board. You then use MATLAB to retrieve the object classification from the FPGA board.
Compared to YOLO v2 networks, YOLO v3 networks have additional detection heads that help detect smaller objects.
Create YOLO v3 Detector Object
In this example, you use a pretrained YOLO v3 object detector. To construct and train a custom YOLO v3 detector, see Object Detection Using YOLO v3 Deep Learning (Computer Vision Toolbox).
Use the downloadPretrainedYOLOv3Detector
function to generate a dlnetwork
object. To get the code for this function, see the downloadPretrainedYOLOv3Detector Function section.
preTrainedDetector = downloadPretrainedYOLOv3Detector;
Downloaded pretrained detector
The generated network uses training data to estimate the anchor boxes, which help the detector learn to predict the boxes. For more information about anchor boxes, see Anchor Boxes for Object Detection (Computer Vision Toolbox). The downloadPretrainedYOLOv3Detector
function creates this YOLO v3 network:
Load the Pretrained network
Extract the network from the pretrained YOLO v3 detector object.
yolov3Detector = preTrainedDetector; net = yolov3Detector.Network;
Extract the attributes of the network as variables.
anchorBoxes = yolov3Detector.AnchorBoxes; outputNames = yolov3Detector.Network.OutputNames; inputSize = yolov3Detector.InputSize; classNames = yolov3Detector.ClassNames;
Use the analyzeNetwork
function to obtain information about the network layers. the function returns a graphical representation of the network that contains detailed parameter information for every layer in the network.
analyzeNetwork(net);
Define FPGA Board Interface
Define the target FPGA board programming interface by using the dlhdl.Target
object. Create a programming interface with custom name for your target device and an Ethernet interface to connect the target device to the host computer.
hTarget = dlhdl.Target('Xilinx','Interface','Ethernet');
Prepare Network for Deployment
Prepare the network for deployment by creating a dlhdl.Workflow
object. Specify the network and bitstream name. Ensure that the bitstream name matches the data type and the FPGA board that you are targeting. In this example, the target FPGA board is the Xilinx® Zynq® UltraScale+™ MPSoC ZCU102 board and the bitstream uses the single data type.
hW = dlhdl.Workflow('Network',net,'Bitstream','zcu102_single','Target',hTarget);
Compile Network
Run the compile
method of the dlhdl.Workflow
object to compile the network and generate the instructions, weights, and biases for deployment.
dn = compile(hW);
### Compiling network for Deep Learning FPGA prototyping ... ### Targeting FPGA bitstream zcu102_single. ### The network includes the following layers: 1 'data' Image Input 227×227×3 images (SW Layer) 2 'conv1' 2-D Convolution 64 3×3×3 convolutions with stride [2 2] and padding [0 0 0 0] (HW Layer) 3 'relu_conv1' ReLU ReLU (HW Layer) 4 'pool1' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 5 'fire2-squeeze1x1' 2-D Convolution 16 1×1×64 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 6 'fire2-relu_squeeze1x1' ReLU ReLU (HW Layer) 7 'fire2-expand1x1' 2-D Convolution 64 1×1×16 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 8 'fire2-relu_expand1x1' ReLU ReLU (HW Layer) 9 'fire2-expand3x3' 2-D Convolution 64 3×3×16 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 10 'fire2-relu_expand3x3' ReLU ReLU (HW Layer) 11 'fire2-concat' Depth concatenation Depth concatenation of 2 inputs (HW Layer) 12 'fire3-squeeze1x1' 2-D Convolution 16 1×1×128 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 13 'fire3-relu_squeeze1x1' ReLU ReLU (HW Layer) 14 'fire3-expand1x1' 2-D Convolution 64 1×1×16 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 15 'fire3-relu_expand1x1' ReLU ReLU (HW Layer) 16 'fire3-expand3x3' 2-D Convolution 64 3×3×16 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 17 'fire3-relu_expand3x3' ReLU ReLU (HW Layer) 18 'fire3-concat' Depth concatenation Depth concatenation of 2 inputs (HW Layer) 19 'pool3' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 1 0 1] (HW Layer) 20 'fire4-squeeze1x1' 2-D Convolution 32 1×1×128 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 21 'fire4-relu_squeeze1x1' ReLU ReLU (HW Layer) 22 'fire4-expand1x1' 2-D Convolution 128 1×1×32 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 23 'fire4-relu_expand1x1' ReLU ReLU (HW Layer) 24 'fire4-expand3x3' 2-D Convolution 128 3×3×32 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 25 'fire4-relu_expand3x3' ReLU ReLU (HW Layer) 26 'fire4-concat' Depth concatenation Depth concatenation of 2 inputs (HW Layer) 27 'fire5-squeeze1x1' 2-D Convolution 32 1×1×256 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 28 'fire5-relu_squeeze1x1' ReLU ReLU (HW Layer) 29 'fire5-expand1x1' 2-D Convolution 128 1×1×32 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 30 'fire5-relu_expand1x1' ReLU ReLU (HW Layer) 31 'fire5-expand3x3' 2-D Convolution 128 3×3×32 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 32 'fire5-relu_expand3x3' ReLU ReLU (HW Layer) 33 'fire5-concat' Depth concatenation Depth concatenation of 2 inputs (HW Layer) 34 'pool5' 2-D Max Pooling 3×3 max pooling with stride [2 2] and padding [0 1 0 1] (HW Layer) 35 'fire6-squeeze1x1' 2-D Convolution 48 1×1×256 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 36 'fire6-relu_squeeze1x1' ReLU ReLU (HW Layer) 37 'fire6-expand1x1' 2-D Convolution 192 1×1×48 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 38 'fire6-relu_expand1x1' ReLU ReLU (HW Layer) 39 'fire6-expand3x3' 2-D Convolution 192 3×3×48 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 40 'fire6-relu_expand3x3' ReLU ReLU (HW Layer) 41 'fire6-concat' Depth concatenation Depth concatenation of 2 inputs (HW Layer) 42 'fire7-squeeze1x1' 2-D Convolution 48 1×1×384 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 43 'fire7-relu_squeeze1x1' ReLU ReLU (HW Layer) 44 'fire7-expand1x1' 2-D Convolution 192 1×1×48 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 45 'fire7-relu_expand1x1' ReLU ReLU (HW Layer) 46 'fire7-expand3x3' 2-D Convolution 192 3×3×48 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 47 'fire7-relu_expand3x3' ReLU ReLU (HW Layer) 48 'fire7-concat' Depth concatenation Depth concatenation of 2 inputs (HW Layer) 49 'fire8-squeeze1x1' 2-D Convolution 64 1×1×384 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 50 'fire8-relu_squeeze1x1' ReLU ReLU (HW Layer) 51 'fire8-expand1x1' 2-D Convolution 256 1×1×64 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 52 'fire8-relu_expand1x1' ReLU ReLU (HW Layer) 53 'fire8-expand3x3' 2-D Convolution 256 3×3×64 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 54 'fire8-relu_expand3x3' ReLU ReLU (HW Layer) 55 'fire8-concat' Depth concatenation Depth concatenation of 2 inputs (HW Layer) 56 'fire9-squeeze1x1' 2-D Convolution 64 1×1×512 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 57 'fire9-relu_squeeze1x1' ReLU ReLU (HW Layer) 58 'fire9-expand1x1' 2-D Convolution 256 1×1×64 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 59 'fire9-relu_expand1x1' ReLU ReLU (HW Layer) 60 'fire9-expand3x3' 2-D Convolution 256 3×3×64 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 61 'fire9-relu_expand3x3' ReLU ReLU (HW Layer) 62 'fire9-concat' Depth concatenation Depth concatenation of 2 inputs (HW Layer) 63 'customConv1' 2-D Convolution 1024 3×3×512 convolutions with stride [1 1] and padding 'same' (HW Layer) 64 'customBatchNorm1' Batch Normalization Batch normalization with 1024 channels (HW Layer) 65 'customRelu1' ReLU ReLU (HW Layer) 66 'customOutputConv1' 2-D Convolution 18 1×1×1024 convolutions with stride [1 1] and padding 'same' (HW Layer) 67 'featureConv2' 2-D Convolution 128 1×1×512 convolutions with stride [1 1] and padding 'same' (HW Layer) 68 'featureBatchNorm2' Batch Normalization Batch normalization with 128 channels (HW Layer) 69 'featureRelu2' ReLU ReLU (HW Layer) 70 'featureResize2' Resize nnet.cnn.layer.Resize2DLayer (HW Layer) 71 'depthConcat2' Depth concatenation Depth concatenation of 2 inputs (HW Layer) 72 'customConv2' 2-D Convolution 256 3×3×384 convolutions with stride [1 1] and padding 'same' (HW Layer) 73 'customBatchNorm2' Batch Normalization Batch normalization with 256 channels (HW Layer) 74 'customRelu2' ReLU ReLU (HW Layer) 75 'customOutputConv2' 2-D Convolution 18 1×1×256 convolutions with stride [1 1] and padding 'same' (HW Layer) ### An output layer called 'Output1_customOutputConv1' of type 'nnet.cnn.layer.RegressionOutputLayer' has been added to the provided network. This layer performs no operation during prediction and thus does not affect the output of the network. ### An output layer called 'Output2_customOutputConv2' of type 'nnet.cnn.layer.RegressionOutputLayer' has been added to the provided network. This layer performs no operation during prediction and thus does not affect the output of the network. ### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer' ### Notice: The layer 'data' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software. ### Notice: The layer 'Output1_customOutputConv1' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software. ### Notice: The layer 'Output2_customOutputConv2' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software. ### Compiling layer group: conv1>>fire2-relu_squeeze1x1 ... ### Compiling layer group: conv1>>fire2-relu_squeeze1x1 ... complete. ### Compiling layer group: fire2-expand1x1>>fire2-relu_expand1x1 ... ### Compiling layer group: fire2-expand1x1>>fire2-relu_expand1x1 ... complete. ### Compiling layer group: fire2-expand3x3>>fire2-relu_expand3x3 ... ### Compiling layer group: fire2-expand3x3>>fire2-relu_expand3x3 ... complete. ### Compiling layer group: fire3-squeeze1x1>>fire3-relu_squeeze1x1 ... ### Compiling layer group: fire3-squeeze1x1>>fire3-relu_squeeze1x1 ... complete. ### Compiling layer group: fire3-expand1x1>>fire3-relu_expand1x1 ... ### Compiling layer group: fire3-expand1x1>>fire3-relu_expand1x1 ... complete. ### Compiling layer group: fire3-expand3x3>>fire3-relu_expand3x3 ... ### Compiling layer group: fire3-expand3x3>>fire3-relu_expand3x3 ... complete. ### Compiling layer group: pool3>>fire4-relu_squeeze1x1 ... ### Compiling layer group: pool3>>fire4-relu_squeeze1x1 ... complete. ### Compiling layer group: fire4-expand1x1>>fire4-relu_expand1x1 ... ### Compiling layer group: fire4-expand1x1>>fire4-relu_expand1x1 ... complete. ### Compiling layer group: fire4-expand3x3>>fire4-relu_expand3x3 ... ### Compiling layer group: fire4-expand3x3>>fire4-relu_expand3x3 ... complete. ### Compiling layer group: fire5-squeeze1x1>>fire5-relu_squeeze1x1 ... ### Compiling layer group: fire5-squeeze1x1>>fire5-relu_squeeze1x1 ... complete. ### Compiling layer group: fire5-expand1x1>>fire5-relu_expand1x1 ... ### Compiling layer group: fire5-expand1x1>>fire5-relu_expand1x1 ... complete. ### Compiling layer group: fire5-expand3x3>>fire5-relu_expand3x3 ... ### Compiling layer group: fire5-expand3x3>>fire5-relu_expand3x3 ... complete. ### Compiling layer group: pool5>>fire6-relu_squeeze1x1 ... ### Compiling layer group: pool5>>fire6-relu_squeeze1x1 ... complete. ### Compiling layer group: fire6-expand1x1>>fire6-relu_expand1x1 ... ### Compiling layer group: fire6-expand1x1>>fire6-relu_expand1x1 ... complete. ### Compiling layer group: fire6-expand3x3>>fire6-relu_expand3x3 ... ### Compiling layer group: fire6-expand3x3>>fire6-relu_expand3x3 ... complete. ### Compiling layer group: fire7-squeeze1x1>>fire7-relu_squeeze1x1 ... ### Compiling layer group: fire7-squeeze1x1>>fire7-relu_squeeze1x1 ... complete. ### Compiling layer group: fire7-expand1x1>>fire7-relu_expand1x1 ... ### Compiling layer group: fire7-expand1x1>>fire7-relu_expand1x1 ... complete. ### Compiling layer group: fire7-expand3x3>>fire7-relu_expand3x3 ... ### Compiling layer group: fire7-expand3x3>>fire7-relu_expand3x3 ... complete. ### Compiling layer group: fire8-squeeze1x1>>fire8-relu_squeeze1x1 ... ### Compiling layer group: fire8-squeeze1x1>>fire8-relu_squeeze1x1 ... complete. ### Compiling layer group: fire8-expand1x1>>fire8-relu_expand1x1 ... ### Compiling layer group: fire8-expand1x1>>fire8-relu_expand1x1 ... complete. ### Compiling layer group: fire8-expand3x3>>fire8-relu_expand3x3 ... ### Compiling layer group: fire8-expand3x3>>fire8-relu_expand3x3 ... complete. ### Compiling layer group: fire9-squeeze1x1>>fire9-relu_squeeze1x1 ... ### Compiling layer group: fire9-squeeze1x1>>fire9-relu_squeeze1x1 ... complete. ### Compiling layer group: fire9-expand1x1>>fire9-relu_expand1x1 ... ### Compiling layer group: fire9-expand1x1>>fire9-relu_expand1x1 ... complete. ### Compiling layer group: fire9-expand3x3>>fire9-relu_expand3x3 ... ### Compiling layer group: fire9-expand3x3>>fire9-relu_expand3x3 ... complete. ### Compiling layer group: customConv1>>customOutputConv1 ... ### Compiling layer group: customConv1>>customOutputConv1 ... complete. ### Compiling layer group: featureConv2>>featureRelu2 ... ### Compiling layer group: featureConv2>>featureRelu2 ... complete. ### Compiling layer group: customConv2>>customOutputConv2 ... ### Compiling layer group: customConv2>>customOutputConv2 ... complete. ### Allocating external memory buffers: offset_name offset_address allocated_space _______________________ ______________ _________________ "InputDataOffset" "0x00000000" "24.0 MB" "OutputResultOffset" "0x01800000" "4.0 MB" "SchedulerDataOffset" "0x01c00000" "4.0 MB" "SystemBufferOffset" "0x02000000" "28.0 MB" "InstructionDataOffset" "0x03c00000" "8.0 MB" "ConvWeightDataOffset" "0x04400000" "104.0 MB" "EndOffset" "0x0ac00000" "Total: 172.0 MB" ### Network compilation complete.
Program Bitstream onto FPGA and Download Network Weights
To deploy the network on the Xilinx® Zynq® UltraScale+ MPSoC ZCU102 hardware, run the deploy
method of the dlhdl.Workflow
object. This method programs the FPGA board using the output of the compile method and the programming file, downloads the network weights and biases, displays progress messages, and the time it takes to deploy the network.
deploy(hW);
### Programming FPGA Bitstream using Ethernet... ### Attempting to connect to the hardware board at 192.168.1.101... ### Connection successful ### Programming FPGA device on Xilinx SoC hardware board at 192.168.1.101... ### Copying FPGA programming files to SD card... ### Setting FPGA bitstream and devicetree for boot... # Copying Bitstream zcu102_single.bit to /mnt/hdlcoder_rd # Set Bitstream to hdlcoder_rd/zcu102_single.bit # Copying Devicetree devicetree_dlhdl.dtb to /mnt/hdlcoder_rd # Set Devicetree to hdlcoder_rd/devicetree_dlhdl.dtb # Set up boot for Reference Design: 'AXI-Stream DDR Memory Access : 3-AXIM' ### Rebooting Xilinx SoC at 192.168.1.101... ### Reboot may take several seconds... ### Attempting to connect to the hardware board at 192.168.1.101... ### Connection successful ### Programming the FPGA bitstream has been completed successfully. ### Loading weights to Conv Processor. ### Conv Weights loaded. Current time is 21-Jun-2022 20:35:11
Test Network
Load the example image and convert the image into a dlarray
. Then classify the image on the FPGA by using the predict
method of the dlhdl.Workflow
object and display the results.
img = imread('vehicle_image.jpg'); I = single(rescale(img)); I = imresize(I, yolov3Detector.InputSize(1:2)); dlX = dlarray(I,'SSC');
Store the output of each detection head of the network in the features
variable. Pass features
to the post-processing function processYOLOv3Ouputs
to combine the multiple outputs and compute the final results. To get the code for this function, see the processYOLOv3Output Function section.
features = cell(size(net.OutputNames')); [features{:}] = hW.predict(dlX);
### Finished writing input activations. ### Running single input activation.
[bboxes, scores, labels] = processYOLOv3Output(anchorBoxes, inputSize, classNames, features, I);
resultImage = insertObjectAnnotation(I,'rectangle',bboxes,scores);
imshow(resultImage)
The FPGA returns a score prediction of 0.89605
with a bounding box drawn around the object in the image. The FPGA also returns a prediction of vehicle to the labels
variable.
downloadPretrainedYOLOv3Detector
Function
The downloadPretrainedYOLOv3Detector
function to download the pretrained YOLO v3 detector network
function detector = downloadPretrainedYOLOv3Detector if ~exist('yolov3SqueezeNetVehicleExample_21aSPKG.mat', 'file') if ~exist('yolov3SqueezeNetVehicleExample_21aSPKG.zip', 'file') zipFile = matlab.internal.examples.downloadSupportFile('vision/data', 'yolov3SqueezeNetVehicleExample_21aSPKG.zip'); copyfile(zipFile); end unzip('yolov3SqueezeNetVehicleExample_21aSPKG.zip'); end pretrained = load("yolov3SqueezeNetVehicleExample_21aSPKG.mat"); detector = pretrained.detector; disp('Downloaded pretrained detector'); end
processYOLOv3Output
Function
The processYOLOv3Output
function is attached as a helper file in this example's directory. This function converts the feature maps from multiple detection heads to bounding boxes, scores and labels. A code snippet of the function is shown below.
function [bboxes, scores, labels] = processYOLOv3Output(anchorBoxes, inputSize, classNames, features, img) % This function converts the feature maps from multiple detection heads to bounding boxes, scores and labels % processYOLOv3Output is C code generatable % Breaks down the raw output from predict function into Confidence score, X, Y, Width, % Height and Class probabilities for each output from detection head predictions = iYolov3Transform(features, anchorBoxes); % Initialize parameters for post-processing inputSize2d = inputSize(1:2); info.PreprocessedImageSize = inputSize2d(1:2); info.ScaleX = size(img,1)/inputSize2d(1); info.ScaleY = size(img,2)/inputSize2d(1); params.MinSize = [1 1]; params.MaxSize = size(img(:,:,1)); params.Threshold = 0.5; params.FractionDownsampling = 1; params.DetectionInputWasBatchOfImages = false; params.NetworkInputSize = inputSize; params.DetectionPreprocessing = "none"; params.SelectStrongest = 1; bboxes = []; scores = []; labels = []; % Post-process the predictions to get bounding boxes, scores and labels [bboxes, scores, labels] = iPostprocessMultipleDetection(anchorBoxes, inputSize, classNames, predictions, info, params); end function [bboxes, scores, labels] = iPostprocessMultipleDetection (anchorBoxes, inputSize, classNames, YPredData, info, params) % Post-process the predictions to get bounding boxes, scores and labels % YpredData is a (x,8) cell array, where x = number of detection heads % Information in each column is: % column 1 -> confidence scores % column 2 to column 5 -> X offset, Y offset, Width, Height of anchor boxes % column 6 -> class probabilities % column 7-8 -> copy of width and height of anchor boxes % Initialize parameters for post-processing classes = classNames; predictions = YPredData; extractPredictions = cell(size(predictions)); % Extract dlarray data for i = 1:size(extractPredictions,1) for j = 1:size(extractPredictions,2) extractPredictions{i,j} = extractdata(predictions{i,j}); end end % Storing the values of columns 2 to 5 of extractPredictions % Columns 2 to 5 represent information about X-coordinate, Y-coordinate, Width and Height of predicted anchor boxes extractedCoordinates = cell(size(predictions,1),4); for i = 1:size(predictions,1) for j = 2:5 extractedCoordinates{i,j-1} = extractPredictions{i,j}; end end % Convert predictions from grid cell coordinates to box coordinates. boxCoordinates = anchorBoxGenerator(anchorBoxes, inputSize, classNames, extractedCoordinates, params.NetworkInputSize); % Replace grid cell coordinates in extractPredictions with box coordinates for i = 1:size(YPredData,1) for j = 2:5 extractPredictions{i,j} = single(boxCoordinates{i,j-1}); end end % 1. Convert bboxes from spatial to pixel dimension % 2. Combine the prediction from different heads. % 3. Filter detections based on threshold. % Reshaping the matrices corresponding to confidence scores and bounding boxes detections = cell(size(YPredData,1),6); for i = 1:size(detections,1) for j = 1:5 detections{i,j} = reshapePredictions(extractPredictions{i,j}); end end % Reshaping the matrices corresponding to class probablities numClasses = repmat({numel(classes)},[size(detections,1),1]); for i = 1:size(detections,1) detections{i,6} = reshapeClasses(extractPredictions{i,6},numClasses{i,1}); end % cell2mat converts the cell of matrices into one matrix, this combines the % predictions of all detection heads detections = cell2mat(detections); % Getting the most probable class and corresponding index [classProbs, classIdx] = max(detections(:,6:end),[],2); detections(:,1) = detections(:,1).*classProbs; detections(:,6) = classIdx; % Keep detections whose confidence score is greater than threshold. detections = detections(detections(:,1) >= params.Threshold,:); [bboxes, scores, labels] = iPostProcessDetections(detections, classes, info, params); end function [bboxes, scores, labels] = iPostProcessDetections(detections, classes, info, params) % Resizes the anchor boxes, filters anchor boxes based on size and apply % NMS to eliminate overlapping anchor boxes if ~isempty(detections) % Obtain bounding boxes and class data for pre-processed image scorePred = detections(:,1); bboxesTmp = detections(:,2:5); classPred = detections(:,6); inputImageSize = ones(1,2); inputImageSize(2) = info.ScaleX.*info.PreprocessedImageSize(2); inputImageSize(1) = info.ScaleY.*info.PreprocessedImageSize(1); % Resize boxes to actual image size. scale = [inputImageSize(2) inputImageSize(1) inputImageSize(2) inputImageSize(1)]; bboxPred = bboxesTmp.*scale; % Convert x and y position of detections from centre to top-left. bboxPred = iConvertCenterToTopLeft(bboxPred); % Filter boxes based on MinSize, MaxSize. [bboxPred, scorePred, classPred] = filterBBoxes(params.MinSize, params.MaxSize, bboxPred, scorePred, classPred); % Apply NMS to eliminate boxes having significant overlap if params.SelectStrongest [bboxes, scores, classNames] = selectStrongestBboxMulticlass(bboxPred, scorePred, classPred ,... 'RatioType', 'Union', 'OverlapThreshold', 0.4); else bboxes = bboxPred; scores = scorePred; classNames = classPred; end % Limit width detections detectionsWd = min((bboxes(:,1) + bboxes(:,3)),inputImageSize(1,2)); bboxes(:,3) = detectionsWd(:,1) - bboxes(:,1); % Limit height detections detectionsHt = min((bboxes(:,2) + bboxes(:,4)),inputImageSize(1,1)); bboxes(:,4) = detectionsHt(:,1) - bboxes(:,2); bboxes(bboxes<1) = 1; % Convert classId to classNames. labels = categorical(classes,cellstr(classes)); labels = labels(classNames); else % If detections are empty then bounding boxes, scores and labels should % be empty bboxes = zeros(0,4,'single'); scores = zeros(0,1,'single'); labels = categorical(classes); end end function x = reshapePredictions(pred) % Reshapes the matrices corresponding to scores, X, Y, Width and Height to % make them compatible for combining the outputs of different detection % heads [h,w,c,n] = size(pred); x = reshape(pred,h*w*c,1,n); end function x = reshapeClasses(pred,numClasses) % Reshapes the matrices corresponding to the class probabilities, to make it % compatible for combining the outputs of different detection heads [h,w,c,n] = size(pred); numAnchors = c/numClasses; x = reshape(pred,h*w,numClasses,numAnchors,n); x = permute(x,[1,3,2,4]); [h,w,c,n] = size(x); x = reshape(x,h*w,c,n); end function bboxes = iConvertCenterToTopLeft(bboxes) % Convert x and y position of detections from centre to top-left. bboxes(:,1) = bboxes(:,1) - bboxes(:,3)/2 + 0.5; bboxes(:,2) = bboxes(:,2) - bboxes(:,4)/2 + 0.5; bboxes = floor(bboxes); bboxes(bboxes<1) = 1; end function tiledAnchors = anchorBoxGenerator(anchorBoxes, inputSize, classNames,YPredCell,inputImageSize) % Convert grid cell coordinates to box coordinates. % Generate tiled anchor offset. tiledAnchors = cell(size(YPredCell)); for i = 1:size(YPredCell,1) anchors = anchorBoxes{i,:}; [h,w,~,n] = size(YPredCell{i,1}); [tiledAnchors{i,2},tiledAnchors{i,1}] = ndgrid(0:h-1,0:w-1,1:size(anchors,1),1:n); [~,~,tiledAnchors{i,3}] = ndgrid(0:h-1,0:w-1,anchors(:,2),1:n); [~,~,tiledAnchors{i,4}] = ndgrid(0:h-1,0:w-1,anchors(:,1),1:n); end for i = 1:size(YPredCell,1) [h,w,~,~] = size(YPredCell{i,1}); tiledAnchors{i,1} = double((tiledAnchors{i,1} + YPredCell{i,1})./w); tiledAnchors{i,2} = double((tiledAnchors{i,2} + YPredCell{i,2})./h); tiledAnchors{i,3} = double((tiledAnchors{i,3}.*YPredCell{i,3})./inputImageSize(2)); tiledAnchors{i,4} = double((tiledAnchors{i,4}.*YPredCell{i,4})./inputImageSize(1)); end end function predictions = iYolov3Transform(YPredictions, anchorBoxes) % This function breaks down the raw output from predict function into Confidence score, X, Y, Width, % Height and Class probabilities for each output from detection head predictions = cell(size(YPredictions,1),size(YPredictions,2) + 2); for idx = 1:size(YPredictions,1) % Get the required info on feature size. numChannelsPred = size(YPredictions{idx},3); %number of channels in a feature map numAnchors = size(anchorBoxes{idx},1); %number of anchor boxes per grid numPredElemsPerAnchors = numChannelsPred/numAnchors; channelsPredIdx = 1:numChannelsPred; predictionIdx = ones([1,numAnchors.*5]); % X positions. startIdx = 1; endIdx = numChannelsPred; stride = numPredElemsPerAnchors; predictions{idx,2} = YPredictions{idx}(:,:,startIdx:stride:endIdx,:); predictionIdx = [predictionIdx startIdx:stride:endIdx]; % Y positions. startIdx = 2; endIdx = numChannelsPred; stride = numPredElemsPerAnchors; predictions{idx,3} = YPredictions{idx}(:,:,startIdx:stride:endIdx,:); predictionIdx = [predictionIdx startIdx:stride:endIdx]; % Width. startIdx = 3; endIdx = numChannelsPred; stride = numPredElemsPerAnchors; predictions{idx,4} = YPredictions{idx}(:,:,startIdx:stride:endIdx,:); predictionIdx = [predictionIdx startIdx:stride:endIdx]; % Height. startIdx = 4; endIdx = numChannelsPred; stride = numPredElemsPerAnchors; predictions{idx,5} = YPredictions{idx}(:,:,startIdx:stride:endIdx,:); predictionIdx = [predictionIdx startIdx:stride:endIdx]; % Confidence scores. startIdx = 5; endIdx = numChannelsPred; stride = numPredElemsPerAnchors; predictions{idx,1} = YPredictions{idx}(:,:,startIdx:stride:endIdx,:); predictionIdx = [predictionIdx startIdx:stride:endIdx]; % Class probabilities. classIdx = setdiff(channelsPredIdx,predictionIdx); predictions{idx,6} = YPredictions{idx}(:,:,classIdx,:); end for i = 1:size(predictions,1) predictions{i,7} = predictions{i,4}; predictions{i,8} = predictions{i,5}; end % Apply activation to the predicted cell array % Apply sigmoid activation to columns 1-3 (Confidence score, X, Y) for i = 1:size(predictions,1) for j = 1:3 predictions{i,j} = sigmoid(predictions{i,j}); end end % Apply exponentiation to columns 4-5 (Width, Height) for i = 1:size(predictions,1) for j = 4:5 predictions{i,j} = exp(predictions{i,j}); end end % Apply sigmoid activation to column 6 (Class probabilities) for i = 1:size(predictions,1) for j = 6 predictions{i,j} = sigmoid(predictions{i,j}); end end end function [bboxPred, scorePred, classPred] = filterBBoxes(minSize, maxSize, bboxPred, scorePred, classPred) % Filter boxes based on MinSize, MaxSize [bboxPred, scorePred, classPred] = filterSmallBBoxes(minSize, bboxPred, scorePred, classPred); [bboxPred, scorePred, classPred] = filterLargeBBoxes(maxSize, bboxPred, scorePred, classPred); end function varargout = filterSmallBBoxes(minSize, varargin) % Filter boxes based on MinSize bboxes = varargin{1}; tooSmall = any((bboxes(:,[4 3]) < minSize),2); for ii = 1:numel(varargin) varargout{ii} = varargin{ii}(~tooSmall,:); end end function varargout = filterLargeBBoxes(maxSize, varargin) % Filter boxes based on MaxSize bboxes = varargin{1}; tooBig = any((bboxes(:,[4 3]) > maxSize),2); for ii = 1:numel(varargin) varargout{ii} = varargin{ii}(~tooBig,:); end end function m = cell2mat(c) % Converts the cell of matrices into one matrix by concatenating % the output corresponding to each feature map elements = numel(c); % If number of elements is 0 return an empty array if elements == 0 m = []; return end % If number of elements is 1, return same element as matrix if elements == 1 if isnumeric(c{1}) || ischar(c{1}) || islogical(c{1}) || isstruct(c{1}) m = c{1}; return end end % Error out for unsupported cell content ciscell = iscell(c{1}); cisobj = isobject(c{1}); if cisobj || ciscell disp('CELL2MAT does not support cell arrays containing cell arrays or objects.'); end % If input input is struct, extract field names of structure into a cell if isstruct(c{1}) cfields = cell(elements,1); for n = 1:elements cfields{n} = fieldnames(c{n}); end if ~isequal(cfields{:}) disp('The field names of each cell array element must be consistent and in consistent order.'); end end % If number of dimensions is 2 if ndims(c) == 2 rows = size(c,1); cols = size(c,2); if (rows < cols) % If rows is less than columns first concatenate each column into 1 % row then concatenate all the rows m = cell(rows,1); for n = 1:rows m{n} = cat(2,c{n,:}); end m = cat(1,m{:}); else % If columns is less than rows, first concatenate each corresponding % row into columns, then combine all columns into 1 m = cell(1,cols); for n = 1:cols m{n} = cat(1,c{:,n}); end m = cat(2,m{:}); end return end end
References
[1] Redmon, Joseph, and Ali Farhadi. “YOLOv3: An Incremental Improvement.” Preprint, submitted April 8, 2018. https://arxiv.org/abs/1804.02767.
Version History
Introduced in R2020b
Open Example
You have a modified version of this example. Do you want to open this example with your edits?
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)