Multi-Object Tracking (MOT) Application Development Guide#

Attention

This sample uses the single-camera dual-channel development pattern. For ByteTrack and OCSort, refer to single_model_example.md. For DeepSORT and BoTSORT, refer to double_model_example.md.

Overview#

Multi-object tracking (MOT) aims to detect multiple targets in a video sequence and keep a stable identity (ID) for each target across adjacent frames. A typical MOT pipeline includes:

Object Detection: detect targets such as pedestrians or vehicles in each frame
State Prediction: predict target motion across adjacent frames, usually with a Kalman filter
Data Association: match current detections with existing tracks using motion, appearance, or both
Track Management: create new tracks, update existing tracks, and remove lost tracks

This sample supports DeepSORT, ByteTrack, OCSort, and BoTSORT. They represent different tradeoffs in terms of accuracy, robustness, compute cost, and dependence on appearance features.

Algorithm Introduction#

DeepSORT#

DeepSORT is an extension of the classic SORT tracker. SORT depends only on motion information, while DeepSORT introduces deep appearance features (ReID) and significantly improves identity consistency under occlusion and re-identification scenarios.

Core components:

Motion model
- constant-velocity Kalman filter
- state typically includes position, scale, aspect ratio, and their velocities
Appearance model
- use a deep CNN to extract feature embeddings for each detected target
- features are usually L2-normalized vectors
Data association
- first perform Mahalanobis-distance gating from Kalman prediction
- then use the Hungarian algorithm on a combined cost of motion distance and appearance distance
Track lifecycle
- includes Tentative, Confirmed, and Deleted states
- multiple successful matches are needed before a track becomes confirmed

ByteTrack#

ByteTrack is a modern MOT algorithm designed to maximize tracking performance without using appearance features. Its key idea is that low-confidence detections still contain useful motion information.

ByteTrack splits detections into:

high-score detections for reliable matching
low-score detections for recovering possibly lost tracks

Matching process:

match tracks with high-score detections through IoU and Hungarian matching
match unmatched tracks with low-score detections
initialize new tracks only from high-score detections

Main characteristics:

pure motion modeling
no ReID model required
very fast and easy to deploy

OCSort#

OCSort improves the SORT/ByteTrack-style tracker by addressing cases where Kalman prediction becomes inaccurate under sudden motion or camera movement.

It introduces observation-centric motion modeling, which relies more on recent observations than on long-term velocity estimates.

Key points:

estimate velocity from recent observations
improve robustness under sudden acceleration
improve robustness when the camera shakes or pans quickly
focus more on geometric consistency than on appearance cues

BoTSORT#

BoTSORT combines ideas from ByteTrack and DeepSORT. It aims to keep high speed while achieving stronger identity consistency.

It integrates:

ByteTrack-style high-score and low-score detection association
optional ReID appearance features
improved motion modeling compared with classic SORT

Its matching strategy usually includes:

a main association stage on high-confidence detections
a secondary association stage on low-confidence detections
IoU distance and optional appearance distance fusion

Comparison and Application Scenarios#

Algorithm	ReID	Motion Emphasis	Complexity	Core Characteristics
DeepSORT	Yes	Kalman filter + appearance	High	strong ID stability under occlusion and re-identification
ByteTrack	No	Kalman filter + IoU	Low	fast, simple, and effective without appearance features
OCSort	No	observation-centric motion	Medium	more robust under detection jitter and unstable motion
BoTSORT	Yes	Kalman + IoU + ReID	High	stronger performance in complex scenes through multi-stage matching

K230 integrates these algorithms under one application style, so you can quickly switch the detection model and tracking parameters without rebuilding the lower media stack.

Build Code#

From the RTOS SDK root:

make list-def
make ***_defconfig
make -j

After the firmware build completes, the image is generated under output.

Build Method 1#

After the code changes are ready, enter one of the algorithm directories under:

src/rtsmart/examples/ai/multi_object_tracking

and run:

./build_app.sh

The build intermediates are generated in build, and the deployment summary files are collected in k230_bin.

Build Method 2#

From the RTOS SDK root, run make menuconfig and enable:

RT-Smart UserSpace Examples Configuration
-> Enable build ai examples
-> Enable Build MOT(Multi-Object Tracking) Programs

Select the target algorithm, save the configuration, then run:

make -j

With this method, the deployment summary files are built directly into the corresponding application directory under /sdcard/app/examples/ai/multi_object_tracking.

You can also enter the target directory directly and run:

make -j

This also supports incremental build and places outputs in k230_bin.

Board Deployment#

Flash the firmware first. See:

how_to_flash

After boot, a virtual disk named CanMV is visible. Copy the generated elf, kmodel, and any required files from k230_bin to CanMV/sdcard.

Then connect to the board over serial and run:

run.sh

After startup, you should see the video output on the screen.

Reference deployment effect:

multi_object_tracking

If you want to use an HDMI display, modify:

~/canmv_k230/src/rtsmart/examples/ai/multi_object_tracking/botsort_track_app/src/setting.h

Change:

#define DISPLAY_TYPE 'st7701'

to:

#define DISPLAY_TYPE 'lt9611'

Then rebuild the application.