Main Content

Detect and Track Objects Using Deep Learning on Android Device

This example shows how to use the Simulink® Support Package for Android™ Devices to deploy a deep learning algorithm that detects and tracks an object on your Android device such as a phone or tablet. This algorithm uses the ResNet-18-based YOLOv2 neural network to identify the object captured by the camera. You can experiment with different objects in your surroundings to see how accurately the network detects images on your Android device.

Prerequisites

  • For more information on how to use the Simulink Support Package for Android Devices to run a Simulink model on your Android device, see Getting Started with Android™ Devices.

  • Download and install any ARM® Compute Library using the Hardware Setup screen. This example uses ARM Compute Library version 19.05. For more information on the Hardware Setup screen, see Install Support for Android Devices.

Required Hardware

  • Android device such as a phone or tablet

  • USB cable

Capture Ground Truth Data for Training

Capture a video of an object that you want to detect and track. You can also follow the training procedure in this example by using a video of your choice. You can capture the video of an object for a longer duration of time in different viewing angles and lighting conditions to obtain a training data set that has better detection and identification results.

After you capture the video, transfer the MP4 file to your host machine.

Export Ground Truth Data for Labeling Using Video Labeler App

The ground truth data contains information about the data source, label definitions, and marked label annotations for a set of ground truth labels. You can export this data using the Video Labeler app into a MAT file.

To open the Video Labeler app, run this command in the MATLAB® Command Window:

videoLabeler

Follow these steps in the Video Labeler app:

1. In the File section, click Import.

2. Select Add Video and select the video of the object.

3. In the ROI Labels pane, click Label. Create a Rectangular label, name it, and click OK. In this example, the object has the name apple.

4. Use the mouse to draw a rectangular ROI in the video.

5. In the Automate Labeling section, click Select Algorithm and select the Point Tracker algorithm. Then click Automate. The algorithm instructions appear in the right pane, and the selected labels are available to automate.

6. In the Run section, click Run to automate labeling for the video.

7. When you are satisfied with the algorithm results, in the Close section, click Accept.

8. Under Export Labels, select To File to export the labeled data to a MAT file, appledetect.mat.

For detailed information on how to use the Video Labeler app, see Video Labeler and Get Started with the Video Labeler.

9. Save the appledetect.mat file in the working directory of the example.

Train YOLOv2 Object Detector

Train the YOLOv2 object detector with the video captured from the camera. The appledetect.mat file contains the exported ground truth data. Use this file to train the YOLOv2 object detector.

The deepresnet18.m file uses a pretrained ResNet-18 neural network as a base of YOLOv2 detection network for the feature extraction of an object. You can find this file in the example folder structure. Make sure that the deepresnet18.m file is present in the same working directory of the example. Open this file and configure the following parameters:

1. Specify the name of the MAT file exported using the Video Labeler app in the labelData parameter. In this example, the MAT file is saved as appledetect.mat.

2. Specify the size of the input image for training the network in the imagesize parameter. In this example, the image size is set to [224, 224, 3].

3. Specify the number of object classes the network has to detect in the numClasses parameter. In this example, the parameter is set to 1 to detect and track one apple.

4. Specify the pretrained ResNet-18 network layer as the base network for feature extraction of the object. In this example, ResNet-18 is the base for the YOLOv2 object detector.

5. Specify the network layer to use for feature extraction. In this example, the ResNet-18 neural network extracts features from the res3b_relu layer. This layer outputs 128 features and the activations have a spatial size of 28-by-28.

6. Specify the size of the anchor boxes in the anchorBoxes field. In this example, the parameter is set to [64,64].

7. Create the YOLOv2 object detection network using the yolov2Layers function.

8. You can also analyze the YOLOv2 network architecture using the analyzeNetwork function. The layers succeeding the feature layer are removed. A series of convolution, ReLU, and batch normalization layers along with the YOLOv2 transform and YOLOv2 output layers are added to the feature layer of the base network.

9. Configure the options for training the deep learning ResNet-18 neural network using the trainingOptions function.

10. After loading the appledetect.mat file, create an image datastore and a box label datastore training data from the specified ground truth file using the objectDetectorTrainingData function.

11. After combining the datastores, train the YOLOv2 network using the trainYOLOv2ObjectDetector function.

12. After the YOLOv2 detector training is complete, save the MAT file. In this example, it is saved as detectedresnet.mat. Save this MAT file in the current working directory of the example.

Configure Simulink Model and Calibrate Parameters

This example uses a preconfigured Simulink model from the Simulink Support Package for Android Devices.

To open the Simulink model, run this command in the MATLAB® Command Window.

open_system('androidObjectClassification');

1. Connect the Android device to the host computer using the USB cable.

2. On the Modeling tab of the Simulink toolstrip, select Model Settings.

3. In the Configuration Parameters dialog box, select Hardware Implementation. Verify that the Hardware board parameter is set to Android device.

4. From the Groups list under Target hardware resources, select Device options.

5. From the Device list, select your Android device. If your device is not listed, click Refresh.

Note: If your device is not listed even after you click Refresh, ensure that you have enabled the USB debugging option on your device. To enable USB debugging, enter androidhwsetup in the MATLAB Command Window and follow the on-screen instructions.

6. In the Configuration Parameters dialog box, select Code Generation from the left pane and from the Target selection section, set Language to C++.

7. Select Code Generation > Interface and in the Deep learning section, set these parameters:

a. Set Target library to ARM Compute.

b. Select ARM Compute Library version based on the installation you chose in the Hardware Setup screen. In this example, it is set to 19.05.

c. Set ARM Compute Library architecture to armv7.

8. Click Apply > OK.

The Android Camera block captures the video of the object using its rear camera. You can configure the following parameters in the Camera Block Parameters dialog box:

1. Set Resolution to Back. To get a list of device specific resolutions, connect your configured device to the host machine and click Refresh.

2. Set the Sample time to 0.25 seconds.

To open the RGB to Image subsystem, run this command in the MATLAB Command Window.

open_system('androidObjectClassification/RGB to Image');

The R, G, and B data received from the Android Camera block is first transposed from row major to column major. This transposed R, G, and B data is fed to the Matrix Concatenate block. This block concatenates the R, G, and B image data to create a contiguous output signal, Imin. You can configure the following parameters in the Vector Concatenate, Matrix Concatenate Block Parameters dialog box:

1. Set Number of inputs to 3. This value indicates the R, G, and B image data input.

2. Set Mode to Multidimensional to perform multidimensional concatenation on the R, G, and B image data input.

3. Set Concatenate dimension to 3 to specify the output dimension along which to concatenate the input array of R, G, and B image data.

The deeplearning function block uses the YOLOv2-based convolutional neural network (CNN) saved as a MAT file. Pass Imin as an input to the detector network. If the object is detected, Imout contains the bounding box information of the detected object.

Pass the name of the MAT file generated from training the YOLOv2 object detector to the deeplearning function block. In this example, the MAT file is detectedresnet.mat.

The ImagetoRGB function block again transposes the image data to R, G, and B image values. These R, G, and B image data values are the inputs to the Android Video Display block in the Simulink model.

The Video Display block displays the video of the object on your Android device.

Run Simulink Model

1. On the Hardware tab of the Simulink model, click Build, Deploy & Start. The androidObjectClassification application launches automatically.

2. Place the object in front of the Android device camera and move the object. Observe the bounding box with the label around the detected object.

3. Move the object and track it on your Android device.

Other Things to Try

  • Train the YOLOv2 object detector to detect and track more than one object.

  • Use a neural network other than ResNet-18 for training the objects and observe the differences in the obtained results.

  • Use a different algorithm in the Video Labeler app and compare the results with the Point Tracker algorithm.

  • Change the input image size provided in the deeplearning function and observe the object detection image.

See Also