Multiple Object Tracking
Tracking is the process of locating a moving object or multiple objects over time in a video stream. Unlike object detection, which is the process of locating an object of interest in a single frame, tracking associates detections of an object across multiple frames.
Tracking multiple objects requires detection, prediction, and data association.
Detection — Detect objects of interest in a video frame.
Prediction — Predict the object locations in the next frame.
Data association — Use the predicted locations to associate detections across frames to form tracks.
Selecting the right approach for detecting objects of interest depends on what you want to track and whether the camera is stationary.
Detect Objects Using Stationary Camera
To detect objects in motion with a stationary camera, you can perform background
subtraction using the
System object™. The background subtraction approach works efficiently, but requires
the camera to be stationary.
Detect Objects Using Moving Camera
To detect objects in motion with a moving camera, you can use a sliding-window detection approach. This approach typically works more slowly than the background subtraction approach. To detect and track a specific category of object, use the System objects or functions described in this table.
Select a Detection Algorithm
|Type of Object to Track
|Anything that moves
|Faces, eyes, nose, mouth, upper body
You can filter the YOLO-based detector results to
keep only the "
|Custom object category
To track an object over time, you must predict its location in the next frame. The simplest method of prediction assumes that the object remains near its last known location. In other words, the previous detection serves as the next prediction. This method is especially effective at high frame rates. However, using this prediction method can fail when objects do not move at constant speeds, or when the frame rate is low relative to the speed of the object in motion.
A more sophisticated method of prediction is to use the previously observed motion of
the object. The Kalman filter (
vision.KalmanFilter) predicts the next
location of an object, by assuming that it moves according to a motion model, such as
constant velocity or constant acceleration. The Kalman filter also takes into account
process noise and measurement noise. Process noise is the
deviation of the actual motion of the object from the motion model.
Measurement noise is the detection error.
To more easily configure a Kalman filter, use the
configureKalmanFilter function. This function sets up the filter for
tracking a physical object moving with constant velocity or constant acceleration within
a Cartesian coordinate system. The statistics are the same along all dimensions. To
configure a Kalman filter with differing assumptions, you must construct the
vision.KalmanFilter object directly.
The Kalman filter assumes that motion and measurement models are linear, and that the uncertainty in each model follows a Gaussian distribution. When these assumptions are incorrect, if the object maneuvers, or when the measurements are incomplete, you must use another tracking filter. The Sensor Fusion and Tracking Toolbox™ provides additional tracking filters. For more details, see Introduction to Estimation Filters (Sensor Fusion and Tracking Toolbox).
Data association is the process of associating detections corresponding to the same physical object across frames. The temporal history of a particular object consists of multiple detections, called a track. A track representation can include the entire history of the previous locations of the object. Alternatively, it can consist of only the last known location and current velocity of the object.
Detection to Track Cost Functions
To match a detection to a track, you must establish criteria for evaluating the
matches. You can establish these criteria by defining a cost function. The higher
the cost of matching a detection to a track, the less likely that the detection
belongs to the track. You can define a simple cost function can be defined as the
degree of overlap between the bounding boxes of the predicted and detected objects.
The Tracking Pedestrians from a Moving Car example implements this type of cost function by using the
bboxOverlapRatio function. You can
implement a more sophisticated cost function, such as one that accounts for the
uncertainty of the prediction, by using the
distance function of the
vision.KalmanFilter object. You can also implement a custom cost
function that can incorporate information about the size and appearance of the
Elimination of Unlikely Matches
Gating is a method of eliminating highly unlikely matches from consideration, such as by imposing a threshold on your cost function. An observation does not match to a track if the cost exceeds a certain threshold value. Using this threshold method effectively results in a circular gating region around each prediction, within which a detection must be found to be considered a match. An alternative gating technique is to make the gating region large enough to include the k-nearest neighbors of the prediction.
Assign Detections to Track
Data association reduces to a minimum a weight bipartite matching problem, (an area of graph theory). A bipartite graph represents tracks and detections as vertices. It also represents the cost of matching a detection and a track as a weighted edge between the corresponding vertices.
implements the Munkres variant of the Hungarian bipartite matching algorithm. Its
input is the cost matrix, where the rows correspond to tracks
and the columns correspond to detections. Each entry contains the cost of assigning
a particular detection to a particular track. You can implement gating by setting
the cost of impossible matches to infinity.
Data association must account for the fact that new objects appearing in the field of
view, or a tracked object leaving the field of view. As such, for any given frame, you
might need to create some new tracks or discard some existing tracks. The
assignDetectionsToTracks function returns the indices of unassigned
tracks and unassigned detections in addition to the matched pairs.
One way of handling unmatched detections is to create a new track from each of them. Alternatively, you can create new tracks from only those unmatched detections greater than a certain size, or from detections that have certain locations or appearances. For example, if the scene has a single entry point, such as a doorway, then you can specify that only unmatched detections located near the entry point can begin new tracks, and to discard all other unmatched detections as noise.
You can also handle unmatched tracks by deleting any track that remains unmatched for a certain number of frames. Alternatively, you can specify to delete an unmatched track when its last known location is near an exit point.
- Import Camera-Based Datasets in MOT Challenge Format for Object Tracking
- Implement Simple Online and Realtime Tracking
- Visual Tracking of Occluded and Unresolved Objects
- Tracking Pedestrians from a Moving Car
- Use Kalman Filter for Object Tracking
- Motion-Based Multiple Object Tracking