AI Demo Guide#

Attention

AI Demo follows the same single-camera, dual-channel development pattern as the single-model template. For the core architecture, see single_model_example.md.

Overview#

K230 AI Demo includes modules for face, human body, hand, license plate, text continuation, speech, and DMS scenarios. It covers classification, detection, segmentation, recognition, tracking, and monocular distance estimation, and provides a practical reference for building AI applications on K230.

These demos are mainly intended to validate K230 capabilities and demonstrate representative scenarios. For production use, scenario-specific tuning is still required. Typical optimization directions include:

threshold tuning
code optimization
quantization optimization
model optimization
training-data optimization

Supported Boards#

CanMV-K230-V1.1
CanMV-K230-V3.0
01Studio CanMV K230
Bpi-CanMV-K230D-Zero
LuShanPi-K230

Source Description#

Source Path#

The source tree is located at:

src/rtsmart/examples/ai/ai_demo

File Tree#

.
├── anomaly_det
├── bytetrack
├── cmake
├── common_files
├── crosswalk_detect
├── distraction_reminder
├── dms_system
├── dynamic_gesture
├── eye_gaze
├── face_alignment
├── face_detection
├── face_emotion
├── face_gender
├── face_glasses
├── face_landmark
├── face_mask
├── face_mesh
├── face_parse
├── face_pose
├── face_verification
├── falldown_detect
├── finger_guessing
├── fitness
├── head_detection
├── helmet_detect
├── kws
├── licence_det
├── licence_det_rec
├── nanotracker
├── object_detect_yolov8n
├── ocr
├── person_attr
├── person_detect
├── person_distance
├── pose_detect
├── pphumanseg
├── puzzle_game
├── segment_yolov8n
├── self_learning
├── shell
├── smoke_detect
├── space_resize
├── sq_hand_det
├── sq_handkp_class
├── sq_handkp_det
├── sq_handkp_flower
├── sq_handkp_ocr
├── sq_handreco
├── traffic_light_detect
├── tts_zh
├── vehicle_attr
├── virtual_keyboard
├── yolop_lane_seg
├── CMakeLists.txt
├── Makefile
└── build_app.sh

Each AI Demo subdirectory includes a README.md file with detailed usage notes.

Shared Files#

All demos share the runtime wrapper files under common_files:

.
├── ai_base.cc
├── ai_base.h
├── ai_utils.cc
├── ai_utils.h
├── scoped_timing.h
├── setting.h
├── video_pipeline.cc
└── video_pipeline.h

Their main responsibilities are:

ai_base.*: common nncase inference wrapper, including kmodel loading, input setup, and output retrieval
ai_utils.*: shared helpers such as palette generation, image saving, and preprocessing utilities
scoped_timing.h: timing helper for profiling
setting.h: display and AI-frame configuration
video_pipeline.*: single-camera dual-channel media wrapper for camera init, frame acquisition, and OSD display

Model Assets#

Related kmodel files, test images, and other dependencies are stored under:

src/rtsmart/libs/kmodel/ai_poc

During AI Demo build, build_app.sh copies the required models and assets into the output directory based on the selected demo.

Demo Notes#

The following table summarizes the main demo directories:

Demo Directory	Scenario	Description
`anomaly_det`	anomaly detection	Detects whether anomalies exist in the inspected target, such as abnormal bottle openings, and is suitable for industrial inspection or similar tasks.
`bytetrack`	multi-object tracking	Uses YOLOv5 for detection, Kalman filtering for box prediction, and the Hungarian algorithm for track association.
`crosswalk_detect`	crosswalk detection	Uses YOLOv5 to detect pedestrian crossings in images or video for assisted-driving scenarios.
`distraction_reminder`	inattentive-driving reminder	Uses face-pose estimation and logic rules to warn when the driver is not looking forward.
`dms_system`	driver monitoring	Combines palm detection and face detection to detect smoking, phone use, drinking, and similar behaviors.
`dynamic_gesture`	dynamic gesture recognition	Recognizes dynamic hand actions such as up, down, left, right, and pinch gestures for touchless interaction.
`eye_gaze`	gaze estimation	Detects faces first and then estimates gaze direction, drawing gaze vectors on the image.
`face_alignment`	face alignment	Outputs per-face depth or normalized projection-coordinate information using a 3D face-alignment flow.
`face_detection`	face detection	Detects face boxes and five facial landmarks in images or video.
`face_emotion`	facial-expression recognition	Uses two models to classify expressions such as neutral, happiness, sadness, anger, disgust, fear, and surprise.
`face_gender`	gender classification	Uses a face detector plus a classification model to label each face as male or female.
`face_glasses`	glasses classification	Determines whether each detected face is wearing glasses.
`face_landmark`	dense face landmarks	Detects 106 landmarks and draws facial contours with different colors.
`face_mask`	mask classification	Determines whether each detected face is wearing a mask.
`face_mesh`	3D face mesh	Outputs a 3D mesh structure for each detected face.
`face_parse`	face segmentation	Segments facial regions such as eyes, nose, and mouth at the pixel level.
`face_pose`	face-pose estimation	Estimates roll, yaw, and pitch for each detected face.
`face_verification`	face verification	Extracts face features and compares two faces to determine whether they belong to the same identity.
`falldown_detect`	fall detection	Detects falling behavior in images or video.
`finger_guessing`	rock-paper-scissors	Recognizes hand gestures from palm detection plus 21 hand keypoints.
`fitness`	squat counting	Counts squat actions in video for fitness-state analysis.
`head_detection`	head detection and counting	Detects human heads and counts how many are present.
`helmet_detect`	helmet detection	Detects whether people are wearing helmets, suitable for safety-monitoring scenarios.
`kws`	keyword spotting	Detects target wake words in the audio stream and can trigger a voice response.
`licence_det`	license plate detection	Detects plate locations in images or video.
`licence_det_rec`	license plate recognition	Detects plate locations and recognizes plate text.
`nanotracker`	single-object tracking	Registers a target in the first few seconds and then tracks it visually in real time.
`object_detect_yolov8n`	YOLOv8 object detection	Runs 80-class COCO detection using YOLOv8n.
`ocr`	OCR detection + recognition	Detects text regions and recognizes text content in images or video.
`person_attr`	person attributes	Detects people and estimates attributes such as gender, age, glasses, and carried items.
`person_detect`	person detection	Detects people in images or video and draws bounding boxes.
`person_distance`	pedestrian ranging	Estimates distance to detected pedestrians based on detection results and scene geometry.
`pose_detect`	human keypoints	Detects 17 body keypoints and connects them into a human pose.
`pphumanseg`	human segmentation	Separates the person from the background and supports portrait compositing and background replacement.
`puzzle_game`	puzzle game	Uses palm detection and hand keypoints to implement an interactive puzzle game.
`segment_yolov8n`	YOLOv8 instance segmentation	Runs 80-class COCO segmentation masks with YOLOv8n-seg.
`self_learning`	self-learning classification	Registers target features first and then performs classification by similarity without retraining.
`smoke_detect`	smoking detection	Detects smoking behavior in images or video.
`space_resize`	touchless zoom	Uses fingertip movement to scale images without touch input.
`sq_hand_det`	palm detection	Detects palm boxes in images or video.
`sq_handkp_class`	hand-keypoint gesture classification	Detects 21 hand keypoints and classifies static gestures.
`sq_handkp_det`	hand keypoint detection	Detects 21 hand keypoints for each palm.
`sq_handkp_flower`	fingertip flower classification	Classifies flowers inside the region selected around the fingertip area.
`sq_handkp_ocr`	fingertip OCR	Recognizes text in the region near the fingertip.
`sq_handreco`	gesture recognition	Recognizes several predefined gestures such as open palm, eight, and yeah.
`traffic_light_detect`	traffic-light detection	Detects red, yellow, and green traffic lights.
`translate_en_ch`	English-to-Chinese translation	Demonstrates a basic EN-to-ZH machine translation task.
`tts_zh`	Chinese text to speech	Uses a three-model pipeline to synthesize Chinese speech from text.
`vehicle_attr`	vehicle attributes	Detects vehicles and estimates vehicle type and body color.
`virtual_keyboard`	touchless virtual keyboard	Uses pinch interaction to input characters through an on-screen keyboard.
`yolop_lane_seg`	lane segmentation	Detects lanes and drivable area in road scenes.

Build and Run#

Select Board Configuration and Build#

From the RTOS root:

make list-def
make ***_defconfig
make -j

Initialize SDK and Build Firmware#

If you are setting up the SDK from scratch, use the following flow:

mkdir -p ~/.bin
curl https://storage.googleapis.com/git-repo-downloads/repo > ~/.bin/repo
chmod a+rx ~/.bin/repo
echo 'export PATH="${HOME}/.bin:${PATH}"' >> ~/.bashrc
source ~/.bashrc

cd ~
mkdir rtos_k230_sdk
cd rtos_k230_sdk

ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
cat ~/.ssh/id_rsa.pub

# Option A: GitHub
repo init -u https://github.com/canmv-k230/manifest -b master --repo-url=https://github.com/canmv-k230/git-repo.git
repo sync
make dl_toolchain
make list-def

If you prefer Gitee, use this repo init command instead of the GitHub one above:

repo init -u https://gitee.com/canmv-k230/manifest -b master --repo-url=https://gitee.com/canmv-k230/git-repo.git

Then select your target board and build:

make ***_defconfig
make -j

After build completes, firmware images are generated in output.

Build Method 1#

After your code changes are ready, enter src/rtsmart/examples/ai/ai_demo and run:

# Build only face detection
./build_app.sh face_detection

# Build all AI demos
./build_app.sh

Build intermediates are generated in build, and deployment artifacts are collected in k230_bin.

Build Method 2#

From the RTOS SDK root, run make menuconfig and enable:

RT-Smart UserSpace Examples Configuration
-> Enable build ai examples
-> Enable Build AI Demo Programs
-> select the target demo

Then run:

make -j

This builds the selected demo directly into:

/sdcard/app/examples/ai/ai_demo/<demo_name>

You can also enter /sdcard/app/examples/ai/ai_demo and run:

make -j

This command also supports incremental build and places the collected outputs in k230_bin.

Board Deployment#

First, flash the firmware. See:

how_to_flash

Then copy the generated elf, kmodel, test images, and any other required files for the selected demo from k230_bin to CanMV/sdcard.

Connect to the board over serial and run the corresponding ***_isp.sh or ***_image.sh script for your selected demo. For example:

cd /sdcard/face_detection
./face_detect_isp.sh

For demo-specific details, refer to the source code, scripts, and README.md in each demo directory.