Cone Detection Using Deep Learning for Formula Student Driverless
Watch a complete workflow for cone detection in Formula Student Driverless scenarios using deep learning. Learn how to use MATLAB® for data preparation and labeling, YOLOX neural network design and training, and deployment to a GPU for real-time inference.
Published: 18 Sep 2025
Hi, I'm Liping Wang, an Education Programs Engineer at MathWorks and a judge at Formula Student China since 2019. Today, I'll show you how to use deep learning to detect cones in Formula Student driverless competitions.
This video was recorded by a race car from Tongji University in China. Like in most Formula Student events, cones of different colors are used to mark the track boundaries, helping driverless cars navigate autonomously.
Typically, a driverless car uses a camera and a LiDAR to capture images and point clouds of its surroundings.
Images help identify cone colors, while point clouds provide accurate information of cone locations.
By calibrating the camera and LiDAR and fusing the data from both, the car can determine the color and position of each cone, enabling autonomous path planning.
Today, we’ll focus on cone detection in images. For more details on other components, please refer to the references.
When detecting objects in an image, you usually ask two questions: “What are they?” and “Where are they?”
Object detection algorithms answer these two questions by assigning labels and drawing bounding boxes around the objects of interest, as you can see in the image on the right.
Of course, you could design your own object detection algorithm or train a deep learning network and then deploy it directly on hardware. But this process is often iterative and time-consuming.
Instead, I’d like to introduce a more efficient workflow for implementing object detection using deep learning in MATLAB.
This workflow has three main steps:
- First, data labeling, which means going through your images and marking the objects you want to detect, like drawing boxes around the cones.
- Next, model training — here, you design a deep learning network and train it using the labeled images.
- Finally, deployment — once the model is trained, you can use tools in MATLAB to deploy it onto hardware, like an embedded GPU in a car, so the model can detect cones in real time.
Now, let us walk through how to perform these steps in MATLAB.
First, we start with data labeling.
MATLAB provides interactive apps like Image Labeler, Video Labeler, and Lidar Labeler. As an example, I’ll show you how to label cones using the Video Labeler.
Welcome to MATLAB!
To get started, click the “APPS” tab and look for Video Labeler in the Image Processing and Computer Vision section.
When you find the icon, you can launch the app by clicking it.
or you can just type videoLabeler in the Command Window.
Once the app opens, import your video first.
Then create label classes for the objects you want to detect.
Next, select one of your label classes and draw a bounding box around each object of that class in every frame.
When you’re done, you can save the results to a file or straight to the workspace for later use.
To save time, the app also comes with built-in auto-labeling tools for people, vehicles, and a few other common objects. You can even add your own custom auto-labeling algorithms.
I’ve already prepared two ground-truth files: one with red and green cones labeled, and another with yellow cone pairs. The yellow pairs tell the car whether to go left or right, so I called them “yellowConeLeft” and “yellowConeRight.”
After labeling the video frames, the next step is to design a deep learning network and train it using the labeled data in MATLAB.
Traditionally, object detection is done in two stages: first, finding regions of interest (ROIs), which is called the region proposal stage, and second, classifying the objects within those regions.
In the first step, regions of interest are generated using methods like sliding windows, selective search, or region proposal networks.
In the second step, features are extracted from the regions—either using traditional methods like SIFT and HOG, or through a convolutional neural network—and then the regions are classified using models like SVMs, decision trees, or neural networks.
Here’s a diagram showing how a sliding-window detector works.
In this demo, we’re using a YOLOX model. YOLO stands for “You Only Look Once,” and it’s a one-stage detector.
Unlike two-stage models like Faster R-CNN, YOLO can detect and classify objects in a single pass, which makes it much faster and thus is suitable for real-time applications.
There are several versions of YOLO, and most of them rely on anchor boxes, which are predefined reference boxes that can help the model predict where objects are and how big they are.
For example, in YOLOv2, the network includes a feature extractor and a detection subnetwork that predicts three attributes of each anchor box:
· Anchor box offsets, refining the position of the anchor box.
· Objectness score, indicating how likely the anchor box contains an object.
· Class Probability, which predicts the types of objects in the anchor box.
Finally, an algorithm called non-maximum suppression (NMS) can be used to filter out overlapping boxes and keep the most confident predictions.
If you're interested in how to design, train, and deploy a YOLOv2 network, please check out the links to these two blog posts.
In this demo, we’re using YOLOX—an anchor-free version of YOLO introduced in 2021.
Instead of relying on predefined anchor boxes, YOLOX directly predicts the center of each object, which makes the model smaller and faster.
YOLOX has three main components:
- Backbone: This is a CNN called CSP-DarkNet-53, pretrained on the COCO dataset. It extracts features from the input image.
- Neck: This part connects the backbone to the head. It uses a Feature Pyramid Network (FPN) to generate feature maps at different scales, and a path aggregation network to combine features from multiple layers.
- Decoupled Detection Head: This part breaks down features of each bounding box into three channels, where:
- Classification Scores: Indicate the predicted class of the bounding box.
- Regression Scores: Provide the location, the width and height of the bounding box.
- Objectness Scores: Reflect the confidence level that the bounding box contains an object.
After covering the basics of YOLOX, let us jump into MATLAB and see how to design and train a YOLOX network.
Alright, step one—data labeling, which has already been done. Remember, you only need to label your data once.
Here you can start by loading the ground-truth file we have saved earlier and turn it into training data using the objectDetectorTrainingData function.
If you look at the first row of the training data, you’ll see five columns: the first one is the frame’s file name, and the next four hold the bounding boxes that mark objects’ locations and sizes of each class.
A bounding box is simply represented by four numbers—the first two give the location of the top-left corner, while the last two represent the width and height of the box.
For example, the first row indicates that we have labeled five green cones and seven red cones, which can be confirmed by displaying the image of the first frame.
Next, shuffle the data and split it into training, validation, and test sets.
It’s also a good idea to take a quick look at your dataset before training—like counting how many objects you have in each class using the countEachLabel function. In this demo, for example, we’re a bit short on the samples of yellow-cone pairs, so adding more could help boost the model’s performance.
You can also plot the size distribution of the bounding boxes to get a sense of how large or small the labeled cones are throughout the dataset.
After analyzing the dataset, you can further improve the model’s performance and generalization by augmenting the training data with transformations like resizing and rotation.
Next, you can use the yoloxObjectDetector function to create a YOLOX object detector pretrained on the Tiny-COCO dataset.
Then set your training options and kick off training with the trainYOLOXObjectDetector function.
The training process takes a while, so for this demo we will load a pretrained model instead. You can learn how to evaluate a model’s performance on the test dataset using metrics like average precision (AP) …… and precision-recall curves.
Or you can choose to run the model on a fresh video clip to see how it performs in action.
Finally, you also can test the performance of a deep learning network and deploy it onto hardware using Simulink.
The Deep Learning Object Detector block allows you to import a detector from a MAT-file or a MATLAB function into Simulink. After running the model, you can see how it performs.
If you want to learn more details, please check this YouTube video introducing how to perform deep learning inference in Simulink.
Once your deep learning model is ready, the final step is to deploy it onto hardware that can be integrated into a vehicle for real-time use.
MATLAB provides tools like MATLAB Coder, Simulink Coder, and GPU Coder. These tools can automatically generate C, C++, or CUDA code and thus accelerate the deployment of your model onto CPUs or GPUs.
In this demo, we’re using an NVIDIA Jetson Xavier — a GPU platform that is suitable for autonomous applications in vehicles.
MATLAB’s hardware support package further makes it easier to deploy models onto NVIDIA Jetson GPUs.
To get started, connect the GPU to a host computer that has MATLAB, the required toolboxes, and the appropriate hardware support packages installed.
Once the model has been trained and deployed, you can either control the model directly from MATLAB or disconnect the host computer and run the model standalone on the GPU for real-time applications.
Now let us see the MATLAB code for deploying a deep learning model to a NVIDIA Jetson GPU. You also can check out this blog to get more details.
Here, the first step is setting up the GPU and the host computer. For more details on how to set up the hardware, please refer to this webpage.
The main application that we want to deploy is called coneDetection.
To run the application on the GPU, you first need to create a Jetson object using the GPU’s IP address, username, and password.
After configuring a coder properly, you can use the codegen function to automatically generate and compile the application for the GPU.
You can start the application using the runApplication function …… or stop it using the killApplication function directly from MATLAB.
When you're done, don’t forget to clear the hardware object.
Then let us take a look at the coneDetection function. It starts by loading a pretrained YOLOX model and creating a Jetson hardware object.
Then, it captures images from a camera, detects cones using the YOLOX model, draws bounding boxes around the detected cones, at last, displays the annotated image.
For this demo, we connected a monitor to the GPU and tested the model using a pre-recorded video from Tongji University instead of using a camera to capture cones in real time.
Once the application has been deployed, you can run it on the Jetson GPU directly from MATLAB.
As you can see, the model can detect most cones in each frame.
To summarize, MATLAB makes it easy to go from data labeling to model training, and all the way to deployment on GPUs — all using built-in tools and functions.
We list the main products, toolboxes, and hardware support packages you’ll need to run this demo in MATLAB.
For more references, please check these pages on MathWorks.com.
You can also explore the MATLAB Deep Learning Model Hub, where you will find a lot of pretrained models — including YOLOv8 — with new models added every month.
MATLAB also supports importing models from PyTorch and TensorFlow, as well as models in ONNX format.
For more information on how to use Python with MATLAB, check out this webpage.
If you have any questions, feel free to reach out to us at racinglounge@mathworks.com. Thanks for watching!