Multi-Object Tracking (MOT) Application Development Guide#
Attention
This sample uses the single-camera dual-channel development pattern. For ByteTrack and OCSort, refer to single_model_example.md. For DeepSORT and BoTSORT, refer to double_model_example.md.
Overview#
Multi-object tracking (MOT) aims to detect multiple targets in a video sequence and keep a stable identity (ID) for each target across adjacent frames. A typical MOT pipeline includes:
Object Detection: detect targets such as pedestrians or vehicles in each frame
State Prediction: predict target motion across adjacent frames, usually with a Kalman filter
Data Association: match current detections with existing tracks using motion, appearance, or both
Track Management: create new tracks, update existing tracks, and remove lost tracks
This sample supports DeepSORT, ByteTrack, OCSort, and BoTSORT. They represent different tradeoffs in terms of accuracy, robustness, compute cost, and dependence on appearance features.
Algorithm Introduction#
DeepSORT#
DeepSORT is an extension of the classic SORT tracker. SORT depends only on motion information, while DeepSORT introduces deep appearance features (ReID) and significantly improves identity consistency under occlusion and re-identification scenarios.
Core components:
Motion model
constant-velocity Kalman filter
state typically includes position, scale, aspect ratio, and their velocities
Appearance model
use a deep CNN to extract feature embeddings for each detected target
features are usually L2-normalized vectors
Data association
first perform Mahalanobis-distance gating from Kalman prediction
then use the Hungarian algorithm on a combined cost of motion distance and appearance distance
Track lifecycle
includes
Tentative,Confirmed, andDeletedstatesmultiple successful matches are needed before a track becomes confirmed
ByteTrack#
ByteTrack is a modern MOT algorithm designed to maximize tracking performance without using appearance features. Its key idea is that low-confidence detections still contain useful motion information.
ByteTrack splits detections into:
high-score detections for reliable matching
low-score detections for recovering possibly lost tracks
Matching process:
match tracks with high-score detections through IoU and Hungarian matching
match unmatched tracks with low-score detections
initialize new tracks only from high-score detections
Main characteristics:
pure motion modeling
no ReID model required
very fast and easy to deploy
OCSort#
OCSort improves the SORT/ByteTrack-style tracker by addressing cases where Kalman prediction becomes inaccurate under sudden motion or camera movement.
It introduces observation-centric motion modeling, which relies more on recent observations than on long-term velocity estimates.
Key points:
estimate velocity from recent observations
improve robustness under sudden acceleration
improve robustness when the camera shakes or pans quickly
focus more on geometric consistency than on appearance cues
BoTSORT#
BoTSORT combines ideas from ByteTrack and DeepSORT. It aims to keep high speed while achieving stronger identity consistency.
It integrates:
ByteTrack-style high-score and low-score detection association
optional ReID appearance features
improved motion modeling compared with classic SORT
Its matching strategy usually includes:
a main association stage on high-confidence detections
a secondary association stage on low-confidence detections
IoU distance and optional appearance distance fusion
Comparison and Application Scenarios#
Algorithm |
ReID |
Motion Emphasis |
Complexity |
Core Characteristics |
|---|---|---|---|---|
DeepSORT |
Yes |
Kalman filter + appearance |
High |
strong ID stability under occlusion and re-identification |
ByteTrack |
No |
Kalman filter + IoU |
Low |
fast, simple, and effective without appearance features |
OCSort |
No |
observation-centric motion |
Medium |
more robust under detection jitter and unstable motion |
BoTSORT |
Yes |
Kalman + IoU + ReID |
High |
stronger performance in complex scenes through multi-stage matching |
K230 integrates these algorithms under one application style, so you can quickly switch the detection model and tracking parameters without rebuilding the lower media stack.
Build Code#
From the RTOS SDK root:
make list-def
make ***_defconfig
make -j
After the firmware build completes, the image is generated under output.
Build Method 1#
After the code changes are ready, enter one of the algorithm directories under:
src/rtsmart/examples/ai/multi_object_tracking
and run:
./build_app.sh
The build intermediates are generated in build, and the deployment summary files are collected in k230_bin.
Build Method 2#
From the RTOS SDK root, run make menuconfig and enable:
RT-Smart UserSpace Examples Configuration
-> Enable build ai examples
-> Enable Build MOT(Multi-Object Tracking) Programs
Select the target algorithm, save the configuration, then run:
make -j
With this method, the deployment summary files are built directly into the corresponding application directory under /sdcard/app/examples/ai/multi_object_tracking.
You can also enter the target directory directly and run:
make -j
This also supports incremental build and places outputs in k230_bin.
Board Deployment#
Flash the firmware first. See:
After boot, a virtual disk named CanMV is visible. Copy the generated elf, kmodel, and any required files from k230_bin to CanMV/sdcard.
Then connect to the board over serial and run:
run.sh
After startup, you should see the video output on the screen.
Reference deployment effect:
If you want to use an HDMI display, modify:
~/canmv_k230/src/rtsmart/examples/ai/multi_object_tracking/botsort_track_app/src/setting.h
Change:
#define DISPLAY_TYPE 'st7701'
to:
#define DISPLAY_TYPE 'lt9611'
Then rebuild the application.
