Classify Images on FPGA by Using Quantized GoogLeNet Network

This example uses:

Deep Learning HDL Toolbox Deep Learning HDL Toolbox
Deep Learning Toolbox Deep Learning Toolbox
Deep Learning HDL Toolbox Support Package for Intel FPGA and SoC Devices Deep Learning HDL Toolbox Support Package for Intel FPGA and SoC Devices
Deep Learning Toolbox Model Quantization Library Deep Learning Toolbox Model Quantization Library
Deep Learning Toolbox Model for GoogLeNet Network Deep Learning Toolbox Model for GoogLeNet Network
Image Processing Toolbox Image Processing Toolbox
MATLAB Coder Interface for Deep Learning MATLAB Coder Interface for Deep Learning

This example show how to use the Deep Learning HDL Toolbox™ to deploy a quantized GoogleNet network to classify an image. The example uses the pretrained GoogLeNet network to demonstrate transfer learning, quantization, and deployment for the quantized network. Quantization helps reduce the memory requirement of a deep neural network by quantizing weights, biases and activations of network layers to 8-bit scaled integer data types. Use MATLAB® to retrieve the prediction results.

Deploy the quantized GoogLeNet network by creating a dlhdl.Workflow object. Use the dlhdl.Workflow object to:

Generate a list of instructions, weights and biases by using the compile method.
Generate a programming file for the FPGA by using the deploy method.
Retrieve the network prediction results and performance by using the predict method.

GoogLeNet has been trained on over a million images and can classify images into 1000 object categories (such as keyboard, coffee mug, pencil, and many animals). The network has learned rich feature representations for a wide range of images. The network takes an image as input, and then outputs a label for the object in the image together with the probabilities for each of the object categories.

Prerequisites

Deep Learning Toolbox™
Deep Learning HDL Toolbox™
Deep Learning Toolbox Model for GoogLeNet Network
Deep Learning HDL Toolbox™ Support Package for Intel FPGA and SoC
Image Processing Toolbox™
Intel Arria10 SoC development kit
Deep Learning Toolbox™ Model Quantization Library support package.
MATLAB Coder Interface for Deep learning Libraries

Transfer Learning Using GoogLeNet

To perform classification on a new set of images, you fine-tune a pretrained GoogLeNet convolutional neural network by transfer learning. In transfer learning, you can take a pretrained network and use it as a starting point to learn a new task. Fine-tuning a network with transfer learning is usually much faster and easier than training a network with randomly initialized weights from scratch. You can quickly transfer learned features to a new task using a smaller number of training images.

Load Pretrained DAG Network

Load the pretrained DAG network, GoogLeNet.

net = googlenet;

Use the analyzeNetwork function to obtain information about the network layers.

analyzeNetwork(net);

The first layer, the image input layer, requires input images of size 224-by-224-by-3, where 3 is the number of color channels.

inputSize = net.Layers(1).InputSize

inputSize = 1×3

   224   224     3

Define Training and Validation Data Sets

This example uses the MathWorks MerchData data set. This is a small data set containing 75 images of MathWorks merchandise, belonging to five different classes (cap, cube, playing cards, screwdriver, and torch).

unzip('MerchData.zip');
imds = imageDatastore('MerchData', ...
    'IncludeSubfolders',true, ...
    'LabelSource','foldernames');

Divide the data into training and validation data sets. Use 70% of the images for training and 30% for validation. splitEachLabel splits the images datastore into two new datastores.

[imdsTrain,imdsValidation] = splitEachLabel(imds,0.7,'randomized');

This data set now contains 55 training images and 20 validation images. Display some sample images.

numTrainImages = numel(imdsTrain.Labels);
idx = randperm(numTrainImages,16);
figure
for i = 1:16
    subplot(4,4,i)
    I = readimage(imdsTrain,idx(i));
    imshow(I)
end

Replace Final Layers

The fully connected layer and classification layer of the pretrained network net are configured for 1000 classes. These two layers, loss3-classifier and output in GoogLeNet, contain information on how to combine the features that the network extracts into class probabilities, a loss value, and predicted labels. To retrain a pretrained network to classify new images, replace these two layers with new layers adapted to the new data set.

Extract the layer graph from the trained network.

lgraph = layerGraph(net)

lgraph = 
  LayerGraph with properties:

         Layers: [144×1 nnet.cnn.layer.Layer]
    Connections: [170×2 table]
     InputNames: {'data'}
    OutputNames: {'output'}

Replace the fully connected layer with a new fully connected layer that has number of outputs equal to the number of classes. To make learning faster in the new layers than in the transferred layers, increase the WeightLearnRateFactor and BiasLearnRateFactor values of the fully connected layer.

numClasses = numel(categories(imdsTrain.Labels))

numClasses = 5

Remove 'loss3-classifier', 'prob' and 'output' layers from the lgraph.

layers = net.SortedLayers;
for i = 0:2
    lgraph = removeLayers(lgraph,layers(end-i).Name);
end

Create three new layers and add them to the lgraph. Ensure the transferred and new layers are properly connected together in the lgraph.

newLayers = [
    fullyConnectedLayer(numClasses,'WeightLearnRateFactor',20,'BiasLearnRateFactor',20,'Name','newFC')
    softmaxLayer('Name','newProb')
    classificationLayer('Name','newClassOutput',"Classes","auto")];

lgraph = addLayers(lgraph,newLayers);
lgraph = connectLayers(lgraph,layers(end-3).Name,'newFC');

Train Network

The network requires input images of size 224-by-224-by-3, but the images in the image datastores have different sizes. Use an augmented image datastore to automatically resize the training images. Specify additional augmentation operations to perform on the training images: randomly flip the training images along the vertical axis, and randomly translate them up to 30 pixels horizontally and vertically. Data augmentation helps prevent the network from over-fitting and memorizing the exact details of the training images.

pixelRange = [-30 30];
imageAugmenter = imageDataAugmenter( ...
    'RandXReflection',true, ...
    'RandXTranslation',pixelRange, ...
    'RandYTranslation',pixelRange);
augimdsTrain = augmentedImageDatastore(inputSize(1:2),imdsTrain, ...
    'DataAugmentation',imageAugmenter);

To automatically resize the validation images without performing further data augmentation, use an augmented image datastore without specifying any additional preprocessing operations.

augimdsValidation = augmentedImageDatastore(inputSize(1:2),imdsValidation);

Specify the training options. For transfer learning, keep the features from the early layers of the pretrained network (the transferred layer weights). To slow down learning in the transferred layers, set the initial learning rate to a small value. In the previous step, the learning rate factors were increased for the fully connected layer to speed up learning in the new final layers. This combination of learning rate settings results in fast learning only in the new layers and slower learning in the other layers. When performing transfer learning, you do not need to train for as many epochs. An epoch is a full training cycle on the entire training data set. Specify the mini-batch size to be 11. The software validates the network every ValidationFrequency iterations during training.

options = trainingOptions('sgdm', ...
    'MiniBatchSize',11, ...
    'MaxEpochs',5, ...
    'InitialLearnRate',2e-4, ...
    'Shuffle','every-epoch', ...
    'ValidationData',augimdsValidation, ...
    'ValidationFrequency',3, ...
    'Verbose',false, ...
    'Plots','training-progress');

Train the network that consists of the transferred and new layers. By default, trainNetwork uses a GPU if one is available (requires Parallel Computing Toolbox™ and a supported GPU device. Otherwise, the network uses a CPU (requires MATLAB Coder Interface for Deep learning Libraries™). You can also specify the execution environment by using the 'ExecutionEnvironment' name-value argument of trainingOptions.

netTransfer = trainNetwork(augimdsTrain,lgraph,options);

Create dlquantizer Object

Create a quantized network by using the dlquantizer object. Set the target execution environment to FPGA..

dlQuantObj = dlquantizer(netTransfer,'ExecutionEnvironment','FPGA');

Calibrate Quantized Network

Use the calibrate function to exercise the network by using sample inputs to collect the range information. The calibrate function exercises the network and collects the dynamic ranges for the learnable parameters of the convolution and fully connected layers of the network.

For best quantization results, the calibration data must be a representative of actual inputs that are predicted by the network.

dlQuantObj.calibrate(augimdsTrain);

Set Up Intel Quartus Prime Standard

Set the synthesis tool path to point to an installed Intel® Quartus® Prime Standard Edition 20.1 executable file. You must have already installed Altera® Quartus II.

% hdlsetuptoolpath('ToolName','Altera Quartus II','ToolPath','C:\intel\20.1\quartus\bin\quartus.exe');

Create Target Object

Create a target object with a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet.

hTarget = dlhdl.Target('Intel','Interface','JTAG');

Generate Bitstream to Run Network

The GoogleNet network consists of multiple Cross Channel Normalization layers. To support this layer on hardware, the 'LRNBlockGeneration' property of the conv module needs to be turned on in the bitstream used for FPGA inference. The shipping arria10soc_int8 bitstream does not have 'LRNBlockGeneration' property turned on. A new bitstream can be generated using the following lines of code. The generated bitstream can be used along with a workflow object for inference.

Update the processor configuration with 'LRNBlockGeneration' property turned on and 'SegmentationBlockGeneration' property turned off. Turn off 'SegmentationBlockGeneration' to fit the Deep Learning IP on the FPGA and avoid overutilization of resources.

% hPC = dlhdl.ProcessorConfig('Bitstream', 'arria10soc_int8');
% hPC.setModuleProperty('conv', 'LRNBlockGeneration', 'on');
% hPC.setModuleProperty('conv', 'SegmentationBlockGeneration', 'off');
% dlhdl.buildProcessor(hPC)

To learn how to use the generated bitstream file, see Generate Custom Bitstream.

Create Workflow Object

Create an object of the dlhdl.Workflow class. Specify dlQuantObj as the network. Make sure to use the generated bitstream which enables processing of Cross Channel Normalization layers on FPGA. In this example, the target FPGA board is the Intel Arria10 SOC board and the generated bitstream uses the int8 data type.

hW = dlhdl.Workflow('network', dlQuantObj, 'Bitstream', 'dlprocessor.sof','Target',hTarget);

Compile Workflow Object

To compile the GoogLeNet network, run the compile function of the dlhdl.Workflow object.

dn = hW.compile

### Compiling network for Deep Learning FPGA prototyping ...
### Targeting FPGA bitstream arria10soc_int8.
### The network includes the following layers:
     1   'data'                           Image Input                   224×224×3 images with 'zerocenter' normalization                       (SW Layer)
     2   'conv1-7x7_s2'                   Convolution                   64 7×7×3 convolutions with stride [2  2] and padding [3  3  3  3]      (HW Layer)
     3   'conv1-relu_7x7'                 ReLU                          ReLU                                                                   (HW Layer)
     4   'pool1-3x3_s2'                   Max Pooling                   3×3 max pooling with stride [2  2] and padding [0  1  0  1]            (HW Layer)
     5   'pool1-norm1'                    Cross Channel Normalization   cross channel normalization with 5 channels per element                (HW Layer)
     6   'conv2-3x3_reduce'               Convolution                   64 1×1×64 convolutions with stride [1  1] and padding [0  0  0  0]     (HW Layer)
     7   'conv2-relu_3x3_reduce'          ReLU                          ReLU                                                                   (HW Layer)
     8   'conv2-3x3'                      Convolution                   192 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]    (HW Layer)
     9   'conv2-relu_3x3'                 ReLU                          ReLU                                                                   (HW Layer)
    10   'conv2-norm2'                    Cross Channel Normalization   cross channel normalization with 5 channels per element                (HW Layer)
    11   'pool2-3x3_s2'                   Max Pooling                   3×3 max pooling with stride [2  2] and padding [0  1  0  1]            (HW Layer)
    12   'inception_3a-1x1'               Convolution                   64 1×1×192 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
    13   'inception_3a-relu_1x1'          ReLU                          ReLU                                                                   (HW Layer)
    14   'inception_3a-3x3_reduce'        Convolution                   96 1×1×192 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
    15   'inception_3a-relu_3x3_reduce'   ReLU                          ReLU                                                                   (HW Layer)
    16   'inception_3a-3x3'               Convolution                   128 3×3×96 convolutions with stride [1  1] and padding [1  1  1  1]    (HW Layer)
    17   'inception_3a-relu_3x3'          ReLU                          ReLU                                                                   (HW Layer)
    18   'inception_3a-5x5_reduce'        Convolution                   16 1×1×192 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
    19   'inception_3a-relu_5x5_reduce'   ReLU                          ReLU                                                                   (HW Layer)
    20   'inception_3a-5x5'               Convolution                   32 5×5×16 convolutions with stride [1  1] and padding [2  2  2  2]     (HW Layer)
    21   'inception_3a-relu_5x5'          ReLU                          ReLU                                                                   (HW Layer)
    22   'inception_3a-pool'              Max Pooling                   3×3 max pooling with stride [1  1] and padding [1  1  1  1]            (HW Layer)
    23   'inception_3a-pool_proj'         Convolution                   32 1×1×192 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
    24   'inception_3a-relu_pool_proj'    ReLU                          ReLU                                                                   (HW Layer)
    25   'inception_3a-output'            Depth concatenation           Depth concatenation of 4 inputs                                        (HW Layer)
    26   'inception_3b-1x1'               Convolution                   128 1×1×256 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
    27   'inception_3b-relu_1x1'          ReLU                          ReLU                                                                   (HW Layer)
    28   'inception_3b-3x3_reduce'        Convolution                   128 1×1×256 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
    29   'inception_3b-relu_3x3_reduce'   ReLU                          ReLU                                                                   (HW Layer)
    30   'inception_3b-3x3'               Convolution                   192 3×3×128 convolutions with stride [1  1] and padding [1  1  1  1]   (HW Layer)
    31   'inception_3b-relu_3x3'          ReLU                          ReLU                                                                   (HW Layer)
    32   'inception_3b-5x5_reduce'        Convolution                   32 1×1×256 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
    33   'inception_3b-relu_5x5_reduce'   ReLU                          ReLU                                                                   (HW Layer)
    34   'inception_3b-5x5'               Convolution                   96 5×5×32 convolutions with stride [1  1] and padding [2  2  2  2]     (HW Layer)
    35   'inception_3b-relu_5x5'          ReLU                          ReLU                                                                   (HW Layer)
    36   'inception_3b-pool'              Max Pooling                   3×3 max pooling with stride [1  1] and padding [1  1  1  1]            (HW Layer)
    37   'inception_3b-pool_proj'         Convolution                   64 1×1×256 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
    38   'inception_3b-relu_pool_proj'    ReLU                          ReLU                                                                   (HW Layer)
    39   'inception_3b-output'            Depth concatenation           Depth concatenation of 4 inputs                                        (HW Layer)
    40   'pool3-3x3_s2'                   Max Pooling                   3×3 max pooling with stride [2  2] and padding [0  1  0  1]            (HW Layer)
    41   'inception_4a-1x1'               Convolution                   192 1×1×480 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
    42   'inception_4a-relu_1x1'          ReLU                          ReLU                                                                   (HW Layer)
    43   'inception_4a-3x3_reduce'        Convolution                   96 1×1×480 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
    44   'inception_4a-relu_3x3_reduce'   ReLU                          ReLU                                                                   (HW Layer)
    45   'inception_4a-3x3'               Convolution                   208 3×3×96 convolutions with stride [1  1] and padding [1  1  1  1]    (HW Layer)
    46   'inception_4a-relu_3x3'          ReLU                          ReLU                                                                   (HW Layer)
    47   'inception_4a-5x5_reduce'        Convolution                   16 1×1×480 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
    48   'inception_4a-relu_5x5_reduce'   ReLU                          ReLU                                                                   (HW Layer)
    49   'inception_4a-5x5'               Convolution                   48 5×5×16 convolutions with stride [1  1] and padding [2  2  2  2]     (HW Layer)
    50   'inception_4a-relu_5x5'          ReLU                          ReLU                                                                   (HW Layer)
    51   'inception_4a-pool'              Max Pooling                   3×3 max pooling with stride [1  1] and padding [1  1  1  1]            (HW Layer)
    52   'inception_4a-pool_proj'         Convolution                   64 1×1×480 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
    53   'inception_4a-relu_pool_proj'    ReLU                          ReLU                                                                   (HW Layer)
    54   'inception_4a-output'            Depth concatenation           Depth concatenation of 4 inputs                                        (HW Layer)
    55   'inception_4b-1x1'               Convolution                   160 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
    56   'inception_4b-relu_1x1'          ReLU                          ReLU                                                                   (HW Layer)
    57   'inception_4b-3x3_reduce'        Convolution                   112 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
    58   'inception_4b-relu_3x3_reduce'   ReLU                          ReLU                                                                   (HW Layer)
    59   'inception_4b-3x3'               Convolution                   224 3×3×112 convolutions with stride [1  1] and padding [1  1  1  1]   (HW Layer)
    60   'inception_4b-relu_3x3'          ReLU                          ReLU                                                                   (HW Layer)
    61   'inception_4b-5x5_reduce'        Convolution                   24 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
    62   'inception_4b-relu_5x5_reduce'   ReLU                          ReLU                                                                   (HW Layer)
    63   'inception_4b-5x5'               Convolution                   64 5×5×24 convolutions with stride [1  1] and padding [2  2  2  2]     (HW Layer)
    64   'inception_4b-relu_5x5'          ReLU                          ReLU                                                                   (HW Layer)
    65   'inception_4b-pool'              Max Pooling                   3×3 max pooling with stride [1  1] and padding [1  1  1  1]            (HW Layer)
    66   'inception_4b-pool_proj'         Convolution                   64 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
    67   'inception_4b-relu_pool_proj'    ReLU                          ReLU                                                                   (HW Layer)
    68   'inception_4b-output'            Depth concatenation           Depth concatenation of 4 inputs                                        (HW Layer)
    69   'inception_4c-1x1'               Convolution                   128 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
    70   'inception_4c-relu_1x1'          ReLU                          ReLU                                                                   (HW Layer)
    71   'inception_4c-3x3_reduce'        Convolution                   128 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
    72   'inception_4c-relu_3x3_reduce'   ReLU                          ReLU                                                                   (HW Layer)
    73   'inception_4c-3x3'               Convolution                   256 3×3×128 convolutions with stride [1  1] and padding [1  1  1  1]   (HW Layer)
    74   'inception_4c-relu_3x3'          ReLU                          ReLU                                                                   (HW Layer)
    75   'inception_4c-5x5_reduce'        Convolution                   24 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
    76   'inception_4c-relu_5x5_reduce'   ReLU                          ReLU                                                                   (HW Layer)
    77   'inception_4c-5x5'               Convolution                   64 5×5×24 convolutions with stride [1  1] and padding [2  2  2  2]     (HW Layer)
    78   'inception_4c-relu_5x5'          ReLU                          ReLU                                                                   (HW Layer)
    79   'inception_4c-pool'              Max Pooling                   3×3 max pooling with stride [1  1] and padding [1  1  1  1]            (HW Layer)
    80   'inception_4c-pool_proj'         Convolution                   64 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
    81   'inception_4c-relu_pool_proj'    ReLU                          ReLU                                                                   (HW Layer)
    82   'inception_4c-output'            Depth concatenation           Depth concatenation of 4 inputs                                        (HW Layer)
    83   'inception_4d-1x1'               Convolution                   112 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
    84   'inception_4d-relu_1x1'          ReLU                          ReLU                                                                   (HW Layer)
    85   'inception_4d-3x3_reduce'        Convolution                   144 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
    86   'inception_4d-relu_3x3_reduce'   ReLU                          ReLU                                                                   (HW Layer)
    87   'inception_4d-3x3'               Convolution                   288 3×3×144 convolutions with stride [1  1] and padding [1  1  1  1]   (HW Layer)
    88   'inception_4d-relu_3x3'          ReLU                          ReLU                                                                   (HW Layer)
    89   'inception_4d-5x5_reduce'        Convolution                   32 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
    90   'inception_4d-relu_5x5_reduce'   ReLU                          ReLU                                                                   (HW Layer)
    91   'inception_4d-5x5'               Convolution                   64 5×5×32 convolutions with stride [1  1] and padding [2  2  2  2]     (HW Layer)
    92   'inception_4d-relu_5x5'          ReLU                          ReLU                                                                   (HW Layer)
    93   'inception_4d-pool'              Max Pooling                   3×3 max pooling with stride [1  1] and padding [1  1  1  1]            (HW Layer)
    94   'inception_4d-pool_proj'         Convolution                   64 1×1×512 convolutions with stride [1  1] and padding [0  0  0  0]    (HW Layer)
    95   'inception_4d-relu_pool_proj'    ReLU                          ReLU                                                                   (HW Layer)
    96   'inception_4d-output'            Depth concatenation           Depth concatenation of 4 inputs                                        (HW Layer)
    97   'inception_4e-1x1'               Convolution                   256 1×1×528 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
    98   'inception_4e-relu_1x1'          ReLU                          ReLU                                                                   (HW Layer)
    99   'inception_4e-3x3_reduce'        Convolution                   160 1×1×528 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
    100   'inception_4e-relu_3x3_reduce'   ReLU                          ReLU                                                                  (HW Layer)
    101   'inception_4e-3x3'               Convolution                   320 3×3×160 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    102   'inception_4e-relu_3x3'          ReLU                          ReLU                                                                  (HW Layer)
    103   'inception_4e-5x5_reduce'        Convolution                   32 1×1×528 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
    104   'inception_4e-relu_5x5_reduce'   ReLU                          ReLU                                                                  (HW Layer)
    105   'inception_4e-5x5'               Convolution                   128 5×5×32 convolutions with stride [1  1] and padding [2  2  2  2]   (HW Layer)
    106   'inception_4e-relu_5x5'          ReLU                          ReLU                                                                  (HW Layer)
    107   'inception_4e-pool'              Max Pooling                   3×3 max pooling with stride [1  1] and padding [1  1  1  1]           (HW Layer)
    108   'inception_4e-pool_proj'         Convolution                   128 1×1×528 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
    109   'inception_4e-relu_pool_proj'    ReLU                          ReLU                                                                  (HW Layer)
    110   'inception_4e-output'            Depth concatenation           Depth concatenation of 4 inputs                                       (HW Layer)
    111   'pool4-3x3_s2'                   Max Pooling                   3×3 max pooling with stride [2  2] and padding [0  1  0  1]           (HW Layer)
    112   'inception_5a-1x1'               Convolution                   256 1×1×832 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
    113   'inception_5a-relu_1x1'          ReLU                          ReLU                                                                  (HW Layer)
    114   'inception_5a-3x3_reduce'        Convolution                   160 1×1×832 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
    115   'inception_5a-relu_3x3_reduce'   ReLU                          ReLU                                                                  (HW Layer)
    116   'inception_5a-3x3'               Convolution                   320 3×3×160 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    117   'inception_5a-relu_3x3'          ReLU                          ReLU                                                                  (HW Layer)
    118   'inception_5a-5x5_reduce'        Convolution                   32 1×1×832 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
    119   'inception_5a-relu_5x5_reduce'   ReLU                          ReLU                                                                  (HW Layer)
    120   'inception_5a-5x5'               Convolution                   128 5×5×32 convolutions with stride [1  1] and padding [2  2  2  2]   (HW Layer)
    121   'inception_5a-relu_5x5'          ReLU                          ReLU                                                                  (HW Layer)
    122   'inception_5a-pool'              Max Pooling                   3×3 max pooling with stride [1  1] and padding [1  1  1  1]           (HW Layer)
    123   'inception_5a-pool_proj'         Convolution                   128 1×1×832 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
    124   'inception_5a-relu_pool_proj'    ReLU                          ReLU                                                                  (HW Layer)
    125   'inception_5a-output'            Depth concatenation           Depth concatenation of 4 inputs                                       (HW Layer)
    126   'inception_5b-1x1'               Convolution                   384 1×1×832 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
    127   'inception_5b-relu_1x1'          ReLU                          ReLU                                                                  (HW Layer)
    128   'inception_5b-3x3_reduce'        Convolution                   192 1×1×832 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
    129   'inception_5b-relu_3x3_reduce'   ReLU                          ReLU                                                                  (HW Layer)
    130   'inception_5b-3x3'               Convolution                   384 3×3×192 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    131   'inception_5b-relu_3x3'          ReLU                          ReLU                                                                  (HW Layer)
    132   'inception_5b-5x5_reduce'        Convolution                   48 1×1×832 convolutions with stride [1  1] and padding [0  0  0  0]   (HW Layer)
    133   'inception_5b-relu_5x5_reduce'   ReLU                          ReLU                                                                  (HW Layer)
    134   'inception_5b-5x5'               Convolution                   128 5×5×48 convolutions with stride [1  1] and padding [2  2  2  2]   (HW Layer)
    135   'inception_5b-relu_5x5'          ReLU                          ReLU                                                                  (HW Layer)
    136   'inception_5b-pool'              Max Pooling                   3×3 max pooling with stride [1  1] and padding [1  1  1  1]           (HW Layer)
    137   'inception_5b-pool_proj'         Convolution                   128 1×1×832 convolutions with stride [1  1] and padding [0  0  0  0]  (HW Layer)
    138   'inception_5b-relu_pool_proj'    ReLU                          ReLU                                                                  (HW Layer)
    139   'inception_5b-output'            Depth concatenation           Depth concatenation of 4 inputs                                       (HW Layer)
    140   'pool5-7x7_s1'                   2-D Global Average Pooling    2-D global average pooling                                            (HW Layer)
    141   'pool5-drop_7x7_s1'              Dropout                       40% dropout                                                           (HW Layer)
    142   'newFC'                          Fully Connected               5 fully connected layer                                               (HW Layer)
    143   'newProb'                        Softmax                       softmax                                                               (HW Layer)
    144   'newClassOutput'                 Classification Output         crossentropyex with 'MathWorks Cap' and 4 other classes               (SW Layer)
                                                                                                                                             
### Notice: The layer 'data' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'newClassOutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.
### Compiling layer group: conv1-7x7_s2>>pool2-3x3_s2 ...
### Compiling layer group: conv1-7x7_s2>>pool2-3x3_s2 ... complete.
### Compiling layer group: inception_3a-1x1>>inception_3a-relu_1x1 ...
### Compiling layer group: inception_3a-1x1>>inception_3a-relu_1x1 ... complete.
### Compiling layer group: inception_3a-3x3_reduce>>inception_3a-relu_3x3 ...
### Compiling layer group: inception_3a-3x3_reduce>>inception_3a-relu_3x3 ... complete.
### Compiling layer group: inception_3a-5x5_reduce>>inception_3a-relu_5x5 ...
### Compiling layer group: inception_3a-5x5_reduce>>inception_3a-relu_5x5 ... complete.
### Compiling layer group: inception_3a-pool>>inception_3a-relu_pool_proj ...
### Compiling layer group: inception_3a-pool>>inception_3a-relu_pool_proj ... complete.
### Compiling layer group: inception_3b-1x1>>inception_3b-relu_1x1 ...
### Compiling layer group: inception_3b-1x1>>inception_3b-relu_1x1 ... complete.
### Compiling layer group: inception_3b-3x3_reduce>>inception_3b-relu_3x3 ...
### Compiling layer group: inception_3b-3x3_reduce>>inception_3b-relu_3x3 ... complete.
### Compiling layer group: inception_3b-5x5_reduce>>inception_3b-relu_5x5 ...
### Compiling layer group: inception_3b-5x5_reduce>>inception_3b-relu_5x5 ... complete.
### Compiling layer group: inception_3b-pool>>inception_3b-relu_pool_proj ...
### Compiling layer group: inception_3b-pool>>inception_3b-relu_pool_proj ... complete.
### Compiling layer group: pool3-3x3_s2 ...
### Compiling layer group: pool3-3x3_s2 ... complete.
### Compiling layer group: inception_4a-1x1>>inception_4a-relu_1x1 ...
### Compiling layer group: inception_4a-1x1>>inception_4a-relu_1x1 ... complete.
### Compiling layer group: inception_4a-3x3_reduce>>inception_4a-relu_3x3 ...
### Compiling layer group: inception_4a-3x3_reduce>>inception_4a-relu_3x3 ... complete.
### Compiling layer group: inception_4a-5x5_reduce>>inception_4a-relu_5x5 ...
### Compiling layer group: inception_4a-5x5_reduce>>inception_4a-relu_5x5 ... complete.
### Compiling layer group: inception_4a-pool>>inception_4a-relu_pool_proj ...
### Compiling layer group: inception_4a-pool>>inception_4a-relu_pool_proj ... complete.
### Compiling layer group: inception_4b-1x1>>inception_4b-relu_1x1 ...
### Compiling layer group: inception_4b-1x1>>inception_4b-relu_1x1 ... complete.
### Compiling layer group: inception_4b-3x3_reduce>>inception_4b-relu_3x3 ...
### Compiling layer group: inception_4b-3x3_reduce>>inception_4b-relu_3x3 ... complete.
### Compiling layer group: inception_4b-5x5_reduce>>inception_4b-relu_5x5 ...
### Compiling layer group: inception_4b-5x5_reduce>>inception_4b-relu_5x5 ... complete.
### Compiling layer group: inception_4b-pool>>inception_4b-relu_pool_proj ...
### Compiling layer group: inception_4b-pool>>inception_4b-relu_pool_proj ... complete.
### Compiling layer group: inception_4c-1x1>>inception_4c-relu_1x1 ...
### Compiling layer group: inception_4c-1x1>>inception_4c-relu_1x1 ... complete.
### Compiling layer group: inception_4c-3x3_reduce>>inception_4c-relu_3x3 ...
### Compiling layer group: inception_4c-3x3_reduce>>inception_4c-relu_3x3 ... complete.
### Compiling layer group: inception_4c-5x5_reduce>>inception_4c-relu_5x5 ...
### Compiling layer group: inception_4c-5x5_reduce>>inception_4c-relu_5x5 ... complete.
### Compiling layer group: inception_4c-pool>>inception_4c-relu_pool_proj ...
### Compiling layer group: inception_4c-pool>>inception_4c-relu_pool_proj ... complete.
### Compiling layer group: inception_4d-1x1>>inception_4d-relu_1x1 ...
### Compiling layer group: inception_4d-1x1>>inception_4d-relu_1x1 ... complete.
### Compiling layer group: inception_4d-3x3_reduce>>inception_4d-relu_3x3 ...
### Compiling layer group: inception_4d-3x3_reduce>>inception_4d-relu_3x3 ... complete.
### Compiling layer group: inception_4d-5x5_reduce>>inception_4d-relu_5x5 ...
### Compiling layer group: inception_4d-5x5_reduce>>inception_4d-relu_5x5 ... complete.
### Compiling layer group: inception_4d-pool>>inception_4d-relu_pool_proj ...
### Compiling layer group: inception_4d-pool>>inception_4d-relu_pool_proj ... complete.
### Compiling layer group: inception_4e-1x1>>inception_4e-relu_1x1 ...
### Compiling layer group: inception_4e-1x1>>inception_4e-relu_1x1 ... complete.
### Compiling layer group: inception_4e-3x3_reduce>>inception_4e-relu_3x3 ...
### Compiling layer group: inception_4e-3x3_reduce>>inception_4e-relu_3x3 ... complete.
### Compiling layer group: inception_4e-5x5_reduce>>inception_4e-relu_5x5 ...
### Compiling layer group: inception_4e-5x5_reduce>>inception_4e-relu_5x5 ... complete.
### Compiling layer group: inception_4e-pool>>inception_4e-relu_pool_proj ...
### Compiling layer group: inception_4e-pool>>inception_4e-relu_pool_proj ... complete.
### Compiling layer group: pool4-3x3_s2 ...
### Compiling layer group: pool4-3x3_s2 ... complete.
### Compiling layer group: inception_5a-1x1>>inception_5a-relu_1x1 ...
### Compiling layer group: inception_5a-1x1>>inception_5a-relu_1x1 ... complete.
### Compiling layer group: inception_5a-3x3_reduce>>inception_5a-relu_3x3 ...
### Compiling layer group: inception_5a-3x3_reduce>>inception_5a-relu_3x3 ... complete.
### Compiling layer group: inception_5a-5x5_reduce>>inception_5a-relu_5x5 ...
### Compiling layer group: inception_5a-5x5_reduce>>inception_5a-relu_5x5 ... complete.
### Compiling layer group: inception_5a-pool>>inception_5a-relu_pool_proj ...
### Compiling layer group: inception_5a-pool>>inception_5a-relu_pool_proj ... complete.
### Compiling layer group: inception_5b-1x1>>inception_5b-relu_1x1 ...
### Compiling layer group: inception_5b-1x1>>inception_5b-relu_1x1 ... complete.
### Compiling layer group: inception_5b-3x3_reduce>>inception_5b-relu_3x3 ...
### Compiling layer group: inception_5b-3x3_reduce>>inception_5b-relu_3x3 ... complete.
### Compiling layer group: inception_5b-5x5_reduce>>inception_5b-relu_5x5 ...
### Compiling layer group: inception_5b-5x5_reduce>>inception_5b-relu_5x5 ... complete.
### Compiling layer group: inception_5b-pool>>inception_5b-relu_pool_proj ...
### Compiling layer group: inception_5b-pool>>inception_5b-relu_pool_proj ... complete.
### Compiling layer group: pool5-7x7_s1 ...
### Compiling layer group: pool5-7x7_s1 ... complete.
### Compiling layer group: newFC ...
### Compiling layer group: newFC ... complete.

### Allocating external memory buffers:

          offset_name          offset_address    allocated_space 
    _______________________    ______________    ________________

    "InputDataOffset"           "0x00000000"     "12.0 MB"       
    "OutputResultOffset"        "0x00c00000"     "4.0 MB"        
    "SchedulerDataOffset"       "0x01000000"     "4.0 MB"        
    "SystemBufferOffset"        "0x01400000"     "28.0 MB"       
    "InstructionDataOffset"     "0x03000000"     "8.0 MB"        
    "ConvWeightDataOffset"      "0x03800000"     "32.0 MB"       
    "FCWeightDataOffset"        "0x05800000"     "4.0 MB"        
    "EndOffset"                 "0x05c00000"     "Total: 92.0 MB"

### Network compilation complete.

dn = struct with fields:
             weights: [1×1 struct]
        instructions: [1×1 struct]
           registers: [1×1 struct]
    syncInstructions: [1×1 struct]

Program Bitstream onto FPGA and Download Network Weights

To deploy the network on the Intel Arria10 SoC hardware, run the deploy function of the dlhdl.Workflow object. This function uses the output of the compile function to program the FPGA board by using the programming file. The function also downloads the network weights and biases. The deploy function starts programming the FPGA device, displays progress messages, and the time it takes to deploy the network.

hW.deploy

### Programming FPGA Bitstream using JTAG...
### Programming the FPGA bitstream has been completed successfully.
### Loading weights to Conv Processor.
### Conv Weights loaded. Current time is 11-Jun-2021 22:20:12
### Loading weights to FC Processor.
### FC Weights loaded. Current time is 11-Jun-2021 22:20:12

Load Example Image

I = imresize(readimage(imdsValidation,1),[224 224]);
figure
imshow(I)

Retrieve Image Prediction

Execute the predict function of the dlhdl.Workflow object and display the prediction results.

[prediction, speed] = hW.predict(single(I),'Profile','off');

### Finished writing input activations.
### Running single input activation.

[val, index] = max(prediction);
label = netTransfer.Layers(end).ClassNames{index}

label = 
'MathWorks Cap'

title(string(label));

Retrieve Deployed Network Performance

View the performance of the deployed network by using the predict method with the Profile argument set to on.

[~, speed] = hW.predict(single(I),'Profile','on')

### Finished writing input activations.
### Running single input activation.


              Deep Learning Processor Profiler Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   15836394                  0.10558                       1           15845325              9.5
    conv1-7x7_s2           1139964                  0.00760 
    pool1-3x3_s2            268928                  0.00179 
    pool1-norm1             310985                  0.00207 
    conv2-3x3_reduce        278740                  0.00186 
    conv2-3x3               823735                  0.00549 
    conv2-norm2             952105                  0.00635 
    pool2-3x3_s2            273479                  0.00182 
    inception_3a-1x1        198078                  0.00132 
    inception_3a-3x3_reduce    280845                  0.00187 
    inception_3a-3x3        196410                  0.00131 
    inception_3a-5x5_reduce     73846                  0.00049 
    inception_3a-5x5         35295                  0.00024 
    inception_3a-pool        94554                  0.00063 
    inception_3a-pool_proj    115223                  0.00077 
    inception_3b-1x1        619945                  0.00413 
    inception_3b-3x3_reduce    620509                  0.00414 
    inception_3b-3x3        367297                  0.00245 
    inception_3b-5x5_reduce    207909                  0.00139 
    inception_3b-5x5        178552                  0.00119 
    inception_3b-pool       179959                  0.00120 
    inception_3b-pool_proj    344959                  0.00230 
    pool3-3x3_s2            293640                  0.00196 
    inception_4a-1x1        332992                  0.00222 
    inception_4a-3x3_reduce    181829                  0.00121 
    inception_4a-3x3         83777                  0.00056 
    inception_4a-5x5_reduce     55639                  0.00037 
    inception_4a-5x5         14500                  0.00010 
    inception_4a-pool        77187                  0.00051 
    inception_4a-pool_proj    130965                  0.00087 
    inception_4b-1x1        300254                  0.00200 
    inception_4b-3x3_reduce    220515                  0.00147 
    inception_4b-3x3        101764                  0.00068 
    inception_4b-5x5_reduce     73096                  0.00049 
    inception_4b-5x5         25720                  0.00017 
    inception_4b-pool        82277                  0.00055 
    inception_4b-pool_proj    139530                  0.00093 
    inception_4c-1x1        246715                  0.00164 
    inception_4c-3x3_reduce    246987                  0.00165 
    inception_4c-3x3        129291                  0.00086 
    inception_4c-5x5_reduce     72855                  0.00049 
    inception_4c-5x5         25444                  0.00017 
    inception_4c-pool        82661                  0.00055 
    inception_4c-pool_proj    139761                  0.00093 
    inception_4d-1x1        220154                  0.00147 
    inception_4d-3x3_reduce    273136                  0.00182 
    inception_4d-3x3        159811                  0.00107 
    inception_4d-5x5_reduce     86719                  0.00058 
    inception_4d-5x5         32485                  0.00022 
    inception_4d-pool        82309                  0.00055 
    inception_4d-pool_proj    139464                  0.00093 
    inception_4e-1x1        474515                  0.00316 
    inception_4e-3x3_reduce    309661                  0.00206 
    inception_4e-3x3        193442                  0.00129 
    inception_4e-5x5_reduce     88661                  0.00059 
    inception_4e-5x5         62881                  0.00042 
    inception_4e-pool        85098                  0.00057 
    inception_4e-pool_proj    254234                  0.00169 
    pool4-3x3_s2            164072                  0.00109 
    inception_5a-1x1        385821                  0.00257 
    inception_5a-3x3_reduce    250827                  0.00167 
    inception_5a-3x3         99439                  0.00066 
    inception_5a-5x5_reduce     69697                  0.00046 
    inception_5a-5x5         32465                  0.00022 
    inception_5a-pool        53624                  0.00036 
    inception_5a-pool_proj    205084                  0.00137 
    inception_5b-1x1        567107                  0.00378 
    inception_5b-3x3_reduce    295819                  0.00197 
    inception_5b-3x3        139308                  0.00093 
    inception_5b-5x5_reduce     92415                  0.00062 
    inception_5b-5x5         46311                  0.00031 
    inception_5b-pool        53882                  0.00036 
    inception_5b-pool_proj    205632                  0.00137 
    pool5-7x7_s1             69837                  0.00047 
    newFC                    23215                  0.00015 
 * The clock frequency of the DL processor is: 150MHz

speed=75×5 table
                                   Latency(cycles)    Latency(seconds)    NumFrames    Total Latency(cycles)    Frame/s 
                                   _______________    ________________    _________    _____________________    ________

    Network                          1.5836e+07             0.10558          "1"            "15845325"          "9.4665"
    ____conv1-7x7_s2                   1.14e+06           0.0075998          ""             ""                  ""      
    ____pool1-3x3_s2                 2.6893e+05           0.0017929          ""             ""                  ""      
    ____pool1-norm1                  3.1098e+05           0.0020732          ""             ""                  ""      
    ____conv2-3x3_reduce             2.7874e+05           0.0018583          ""             ""                  ""      
    ____conv2-3x3                    8.2374e+05           0.0054916          ""             ""                  ""      
    ____conv2-norm2                   9.521e+05           0.0063474          ""             ""                  ""      
    ____pool2-3x3_s2                 2.7348e+05           0.0018232          ""             ""                  ""      
    ____inception_3a-1x1             1.9808e+05           0.0013205          ""             ""                  ""      
    ____inception_3a-3x3_reduce      2.8084e+05           0.0018723          ""             ""                  ""      
    ____inception_3a-3x3             1.9641e+05           0.0013094          ""             ""                  ""      
    ____inception_3a-5x5_reduce           73846          0.00049231          ""             ""                  ""      
    ____inception_3a-5x5                  35295           0.0002353          ""             ""                  ""      
    ____inception_3a-pool                 94554          0.00063036          ""             ""                  ""      
    ____inception_3a-pool_proj       1.1522e+05          0.00076815          ""             ""                  ""      
    ____inception_3b-1x1             6.1994e+05            0.004133          ""             ""                  ""      
      ⋮

The speed table contains the latency information for every layer, total network latency, and the overall network performance in frames per second (FPS). For more information, see Profile Inference Run.