AI Demo Guide#
Attention
AI Demo follows the same single-camera, dual-channel development pattern as the single-model template. For the core architecture, see single_model_example.md.
Overview#
K230 AI Demo includes modules for face, human body, hand, license plate, text continuation, speech, and DMS scenarios. It covers classification, detection, segmentation, recognition, tracking, and monocular distance estimation, and provides a practical reference for building AI applications on K230.
These demos are mainly intended to validate K230 capabilities and demonstrate representative scenarios. For production use, scenario-specific tuning is still required. Typical optimization directions include:
threshold tuning
code optimization
quantization optimization
model optimization
training-data optimization
Supported Boards#
CanMV-K230-V1.1
CanMV-K230-V3.0
01Studio CanMV K230
Bpi-CanMV-K230D-Zero
LuShanPi-K230
Source Description#
Source Path#
The source tree is located at:
src/rtsmart/examples/ai/ai_demo
File Tree#
.
├── anomaly_det
├── bytetrack
├── cmake
├── common_files
├── crosswalk_detect
├── distraction_reminder
├── dms_system
├── dynamic_gesture
├── eye_gaze
├── face_alignment
├── face_detection
├── face_emotion
├── face_gender
├── face_glasses
├── face_landmark
├── face_mask
├── face_mesh
├── face_parse
├── face_pose
├── face_verification
├── falldown_detect
├── finger_guessing
├── fitness
├── head_detection
├── helmet_detect
├── kws
├── licence_det
├── licence_det_rec
├── nanotracker
├── object_detect_yolov8n
├── ocr
├── person_attr
├── person_detect
├── person_distance
├── pose_detect
├── pphumanseg
├── puzzle_game
├── segment_yolov8n
├── self_learning
├── shell
├── smoke_detect
├── space_resize
├── sq_hand_det
├── sq_handkp_class
├── sq_handkp_det
├── sq_handkp_flower
├── sq_handkp_ocr
├── sq_handreco
├── traffic_light_detect
├── tts_zh
├── vehicle_attr
├── virtual_keyboard
├── yolop_lane_seg
├── CMakeLists.txt
├── Makefile
└── build_app.sh
Each AI Demo subdirectory includes a README.md file with detailed usage notes.
Model Assets#
Related kmodel files, test images, and other dependencies are stored under:
src/rtsmart/libs/kmodel/ai_poc
During AI Demo build, build_app.sh copies the required models and assets into the output directory based on the selected demo.
Demo Notes#
The following table summarizes the main demo directories:
Demo Directory |
Scenario |
Description |
|---|---|---|
|
anomaly detection |
Detects whether anomalies exist in the inspected target, such as abnormal bottle openings, and is suitable for industrial inspection or similar tasks. |
|
multi-object tracking |
Uses YOLOv5 for detection, Kalman filtering for box prediction, and the Hungarian algorithm for track association. |
|
crosswalk detection |
Uses YOLOv5 to detect pedestrian crossings in images or video for assisted-driving scenarios. |
|
inattentive-driving reminder |
Uses face-pose estimation and logic rules to warn when the driver is not looking forward. |
|
driver monitoring |
Combines palm detection and face detection to detect smoking, phone use, drinking, and similar behaviors. |
|
dynamic gesture recognition |
Recognizes dynamic hand actions such as up, down, left, right, and pinch gestures for touchless interaction. |
|
gaze estimation |
Detects faces first and then estimates gaze direction, drawing gaze vectors on the image. |
|
face alignment |
Outputs per-face depth or normalized projection-coordinate information using a 3D face-alignment flow. |
|
face detection |
Detects face boxes and five facial landmarks in images or video. |
|
facial-expression recognition |
Uses two models to classify expressions such as neutral, happiness, sadness, anger, disgust, fear, and surprise. |
|
gender classification |
Uses a face detector plus a classification model to label each face as male or female. |
|
glasses classification |
Determines whether each detected face is wearing glasses. |
|
dense face landmarks |
Detects 106 landmarks and draws facial contours with different colors. |
|
mask classification |
Determines whether each detected face is wearing a mask. |
|
3D face mesh |
Outputs a 3D mesh structure for each detected face. |
|
face segmentation |
Segments facial regions such as eyes, nose, and mouth at the pixel level. |
|
face-pose estimation |
Estimates roll, yaw, and pitch for each detected face. |
|
face verification |
Extracts face features and compares two faces to determine whether they belong to the same identity. |
|
fall detection |
Detects falling behavior in images or video. |
|
rock-paper-scissors |
Recognizes hand gestures from palm detection plus 21 hand keypoints. |
|
squat counting |
Counts squat actions in video for fitness-state analysis. |
|
head detection and counting |
Detects human heads and counts how many are present. |
|
helmet detection |
Detects whether people are wearing helmets, suitable for safety-monitoring scenarios. |
|
keyword spotting |
Detects target wake words in the audio stream and can trigger a voice response. |
|
license plate detection |
Detects plate locations in images or video. |
|
license plate recognition |
Detects plate locations and recognizes plate text. |
|
single-object tracking |
Registers a target in the first few seconds and then tracks it visually in real time. |
|
YOLOv8 object detection |
Runs 80-class COCO detection using YOLOv8n. |
|
OCR detection + recognition |
Detects text regions and recognizes text content in images or video. |
|
person attributes |
Detects people and estimates attributes such as gender, age, glasses, and carried items. |
|
person detection |
Detects people in images or video and draws bounding boxes. |
|
pedestrian ranging |
Estimates distance to detected pedestrians based on detection results and scene geometry. |
|
human keypoints |
Detects 17 body keypoints and connects them into a human pose. |
|
human segmentation |
Separates the person from the background and supports portrait compositing and background replacement. |
|
puzzle game |
Uses palm detection and hand keypoints to implement an interactive puzzle game. |
|
YOLOv8 instance segmentation |
Runs 80-class COCO segmentation masks with YOLOv8n-seg. |
|
self-learning classification |
Registers target features first and then performs classification by similarity without retraining. |
|
smoking detection |
Detects smoking behavior in images or video. |
|
touchless zoom |
Uses fingertip movement to scale images without touch input. |
|
palm detection |
Detects palm boxes in images or video. |
|
hand-keypoint gesture classification |
Detects 21 hand keypoints and classifies static gestures. |
|
hand keypoint detection |
Detects 21 hand keypoints for each palm. |
|
fingertip flower classification |
Classifies flowers inside the region selected around the fingertip area. |
|
fingertip OCR |
Recognizes text in the region near the fingertip. |
|
gesture recognition |
Recognizes several predefined gestures such as open palm, eight, and yeah. |
|
traffic-light detection |
Detects red, yellow, and green traffic lights. |
|
English-to-Chinese translation |
Demonstrates a basic EN-to-ZH machine translation task. |
|
Chinese text to speech |
Uses a three-model pipeline to synthesize Chinese speech from text. |
|
vehicle attributes |
Detects vehicles and estimates vehicle type and body color. |
|
touchless virtual keyboard |
Uses pinch interaction to input characters through an on-screen keyboard. |
|
lane segmentation |
Detects lanes and drivable area in road scenes. |
Build and Run#
Select Board Configuration and Build#
From the RTOS root:
make list-def
make ***_defconfig
make -j
Initialize SDK and Build Firmware#
If you are setting up the SDK from scratch, use the following flow:
mkdir -p ~/.bin
curl https://storage.googleapis.com/git-repo-downloads/repo > ~/.bin/repo
chmod a+rx ~/.bin/repo
echo 'export PATH="${HOME}/.bin:${PATH}"' >> ~/.bashrc
source ~/.bashrc
cd ~
mkdir rtos_k230_sdk
cd rtos_k230_sdk
ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
cat ~/.ssh/id_rsa.pub
# Option A: GitHub
repo init -u https://github.com/canmv-k230/manifest -b master --repo-url=https://github.com/canmv-k230/git-repo.git
repo sync
make dl_toolchain
make list-def
If you prefer Gitee, use this repo init command instead of the GitHub one above:
repo init -u https://gitee.com/canmv-k230/manifest -b master --repo-url=https://gitee.com/canmv-k230/git-repo.git
Then select your target board and build:
make ***_defconfig
make -j
After build completes, firmware images are generated in output.
Build Method 1#
After your code changes are ready, enter src/rtsmart/examples/ai/ai_demo and run:
# Build only face detection
./build_app.sh face_detection
# Build all AI demos
./build_app.sh
Build intermediates are generated in build, and deployment artifacts are collected in k230_bin.
Build Method 2#
From the RTOS SDK root, run make menuconfig and enable:
RT-Smart UserSpace Examples Configuration
-> Enable build ai examples
-> Enable Build AI Demo Programs
-> select the target demo
Then run:
make -j
This builds the selected demo directly into:
/sdcard/app/examples/ai/ai_demo/<demo_name>
You can also enter /sdcard/app/examples/ai/ai_demo and run:
make -j
This command also supports incremental build and places the collected outputs in k230_bin.
Board Deployment#
First, flash the firmware. See:
Then copy the generated elf, kmodel, test images, and any other required files for the selected demo from k230_bin to CanMV/sdcard.
Connect to the board over serial and run the corresponding ***_isp.sh or ***_image.sh script for your selected demo. For example:
cd /sdcard/face_detection
./face_detect_isp.sh
For demo-specific details, refer to the source code, scripts, and README.md in each demo directory.
