Note

This is the documentation for the latest development branch and may refer to features that are not available in released versions. If you are looking for the documentation for a specific release, use the drop-down menu on the left and select the desired version.

AI Demo Guide#

Attention

AI Demo follows the same single-camera, dual-channel development pattern as the single-model template. For the core architecture, see single_model_example.md.

Overview#

K230 AI Demo includes modules for face, human body, hand, license plate, text continuation, speech, and DMS scenarios. It covers classification, detection, segmentation, recognition, tracking, and monocular distance estimation, and provides a practical reference for building AI applications on K230.

These demos are mainly intended to validate K230 capabilities and demonstrate representative scenarios. For production use, scenario-specific tuning is still required. Typical optimization directions include:

  • threshold tuning

  • code optimization

  • quantization optimization

  • model optimization

  • training-data optimization

Supported Boards#

  • CanMV-K230-V1.1

  • CanMV-K230-V3.0

  • 01Studio CanMV K230

  • Bpi-CanMV-K230D-Zero

  • LuShanPi-K230

Source Description#

Source Path#

The source tree is located at:

src/rtsmart/examples/ai/ai_demo

File Tree#

.
├── anomaly_det
├── bytetrack
├── cmake
├── common_files
├── crosswalk_detect
├── distraction_reminder
├── dms_system
├── dynamic_gesture
├── eye_gaze
├── face_alignment
├── face_detection
├── face_emotion
├── face_gender
├── face_glasses
├── face_landmark
├── face_mask
├── face_mesh
├── face_parse
├── face_pose
├── face_verification
├── falldown_detect
├── finger_guessing
├── fitness
├── head_detection
├── helmet_detect
├── kws
├── licence_det
├── licence_det_rec
├── nanotracker
├── object_detect_yolov8n
├── ocr
├── person_attr
├── person_detect
├── person_distance
├── pose_detect
├── pphumanseg
├── puzzle_game
├── segment_yolov8n
├── self_learning
├── shell
├── smoke_detect
├── space_resize
├── sq_hand_det
├── sq_handkp_class
├── sq_handkp_det
├── sq_handkp_flower
├── sq_handkp_ocr
├── sq_handreco
├── traffic_light_detect
├── tts_zh
├── vehicle_attr
├── virtual_keyboard
├── yolop_lane_seg
├── CMakeLists.txt
├── Makefile
└── build_app.sh

Each AI Demo subdirectory includes a README.md file with detailed usage notes.

Shared Files#

All demos share the runtime wrapper files under common_files:

.
├── ai_base.cc
├── ai_base.h
├── ai_utils.cc
├── ai_utils.h
├── scoped_timing.h
├── setting.h
├── video_pipeline.cc
└── video_pipeline.h

Their main responsibilities are:

  • ai_base.*: common nncase inference wrapper, including kmodel loading, input setup, and output retrieval

  • ai_utils.*: shared helpers such as palette generation, image saving, and preprocessing utilities

  • scoped_timing.h: timing helper for profiling

  • setting.h: display and AI-frame configuration

  • video_pipeline.*: single-camera dual-channel media wrapper for camera init, frame acquisition, and OSD display

Model Assets#

Related kmodel files, test images, and other dependencies are stored under:

src/rtsmart/libs/kmodel/ai_poc

During AI Demo build, build_app.sh copies the required models and assets into the output directory based on the selected demo.

Demo Notes#

The following table summarizes the main demo directories:

Demo Directory

Scenario

Description

anomaly_det

anomaly detection

Detects whether anomalies exist in the inspected target, such as abnormal bottle openings, and is suitable for industrial inspection or similar tasks.

bytetrack

multi-object tracking

Uses YOLOv5 for detection, Kalman filtering for box prediction, and the Hungarian algorithm for track association.

crosswalk_detect

crosswalk detection

Uses YOLOv5 to detect pedestrian crossings in images or video for assisted-driving scenarios.

distraction_reminder

inattentive-driving reminder

Uses face-pose estimation and logic rules to warn when the driver is not looking forward.

dms_system

driver monitoring

Combines palm detection and face detection to detect smoking, phone use, drinking, and similar behaviors.

dynamic_gesture

dynamic gesture recognition

Recognizes dynamic hand actions such as up, down, left, right, and pinch gestures for touchless interaction.

eye_gaze

gaze estimation

Detects faces first and then estimates gaze direction, drawing gaze vectors on the image.

face_alignment

face alignment

Outputs per-face depth or normalized projection-coordinate information using a 3D face-alignment flow.

face_detection

face detection

Detects face boxes and five facial landmarks in images or video.

face_emotion

facial-expression recognition

Uses two models to classify expressions such as neutral, happiness, sadness, anger, disgust, fear, and surprise.

face_gender

gender classification

Uses a face detector plus a classification model to label each face as male or female.

face_glasses

glasses classification

Determines whether each detected face is wearing glasses.

face_landmark

dense face landmarks

Detects 106 landmarks and draws facial contours with different colors.

face_mask

mask classification

Determines whether each detected face is wearing a mask.

face_mesh

3D face mesh

Outputs a 3D mesh structure for each detected face.

face_parse

face segmentation

Segments facial regions such as eyes, nose, and mouth at the pixel level.

face_pose

face-pose estimation

Estimates roll, yaw, and pitch for each detected face.

face_verification

face verification

Extracts face features and compares two faces to determine whether they belong to the same identity.

falldown_detect

fall detection

Detects falling behavior in images or video.

finger_guessing

rock-paper-scissors

Recognizes hand gestures from palm detection plus 21 hand keypoints.

fitness

squat counting

Counts squat actions in video for fitness-state analysis.

head_detection

head detection and counting

Detects human heads and counts how many are present.

helmet_detect

helmet detection

Detects whether people are wearing helmets, suitable for safety-monitoring scenarios.

kws

keyword spotting

Detects target wake words in the audio stream and can trigger a voice response.

licence_det

license plate detection

Detects plate locations in images or video.

licence_det_rec

license plate recognition

Detects plate locations and recognizes plate text.

nanotracker

single-object tracking

Registers a target in the first few seconds and then tracks it visually in real time.

object_detect_yolov8n

YOLOv8 object detection

Runs 80-class COCO detection using YOLOv8n.

ocr

OCR detection + recognition

Detects text regions and recognizes text content in images or video.

person_attr

person attributes

Detects people and estimates attributes such as gender, age, glasses, and carried items.

person_detect

person detection

Detects people in images or video and draws bounding boxes.

person_distance

pedestrian ranging

Estimates distance to detected pedestrians based on detection results and scene geometry.

pose_detect

human keypoints

Detects 17 body keypoints and connects them into a human pose.

pphumanseg

human segmentation

Separates the person from the background and supports portrait compositing and background replacement.

puzzle_game

puzzle game

Uses palm detection and hand keypoints to implement an interactive puzzle game.

segment_yolov8n

YOLOv8 instance segmentation

Runs 80-class COCO segmentation masks with YOLOv8n-seg.

self_learning

self-learning classification

Registers target features first and then performs classification by similarity without retraining.

smoke_detect

smoking detection

Detects smoking behavior in images or video.

space_resize

touchless zoom

Uses fingertip movement to scale images without touch input.

sq_hand_det

palm detection

Detects palm boxes in images or video.

sq_handkp_class

hand-keypoint gesture classification

Detects 21 hand keypoints and classifies static gestures.

sq_handkp_det

hand keypoint detection

Detects 21 hand keypoints for each palm.

sq_handkp_flower

fingertip flower classification

Classifies flowers inside the region selected around the fingertip area.

sq_handkp_ocr

fingertip OCR

Recognizes text in the region near the fingertip.

sq_handreco

gesture recognition

Recognizes several predefined gestures such as open palm, eight, and yeah.

traffic_light_detect

traffic-light detection

Detects red, yellow, and green traffic lights.

translate_en_ch

English-to-Chinese translation

Demonstrates a basic EN-to-ZH machine translation task.

tts_zh

Chinese text to speech

Uses a three-model pipeline to synthesize Chinese speech from text.

vehicle_attr

vehicle attributes

Detects vehicles and estimates vehicle type and body color.

virtual_keyboard

touchless virtual keyboard

Uses pinch interaction to input characters through an on-screen keyboard.

yolop_lane_seg

lane segmentation

Detects lanes and drivable area in road scenes.

Build and Run#

Select Board Configuration and Build#

From the RTOS root:

make list-def
make ***_defconfig
make -j

Initialize SDK and Build Firmware#

If you are setting up the SDK from scratch, use the following flow:

mkdir -p ~/.bin
curl https://storage.googleapis.com/git-repo-downloads/repo > ~/.bin/repo
chmod a+rx ~/.bin/repo
echo 'export PATH="${HOME}/.bin:${PATH}"' >> ~/.bashrc
source ~/.bashrc

cd ~
mkdir rtos_k230_sdk
cd rtos_k230_sdk

ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
cat ~/.ssh/id_rsa.pub

# Option A: GitHub
repo init -u https://github.com/canmv-k230/manifest -b master --repo-url=https://github.com/canmv-k230/git-repo.git
repo sync
make dl_toolchain
make list-def

If you prefer Gitee, use this repo init command instead of the GitHub one above:

repo init -u https://gitee.com/canmv-k230/manifest -b master --repo-url=https://gitee.com/canmv-k230/git-repo.git

Then select your target board and build:

make ***_defconfig
make -j

After build completes, firmware images are generated in output.

Build Method 1#

After your code changes are ready, enter src/rtsmart/examples/ai/ai_demo and run:

# Build only face detection
./build_app.sh face_detection

# Build all AI demos
./build_app.sh

Build intermediates are generated in build, and deployment artifacts are collected in k230_bin.

Build Method 2#

From the RTOS SDK root, run make menuconfig and enable:

RT-Smart UserSpace Examples Configuration
-> Enable build ai examples
-> Enable Build AI Demo Programs
-> select the target demo

Then run:

make -j

This builds the selected demo directly into:

/sdcard/app/examples/ai/ai_demo/<demo_name>

You can also enter /sdcard/app/examples/ai/ai_demo and run:

make -j

This command also supports incremental build and places the collected outputs in k230_bin.

Board Deployment#

First, flash the firmware. See:

how_to_flash

Then copy the generated elf, kmodel, test images, and any other required files for the selected demo from k230_bin to CanMV/sdcard.

Connect to the board over serial and run the corresponding ***_isp.sh or ***_image.sh script for your selected demo. For example:

cd /sdcard/face_detection
./face_detect_isp.sh

For demo-specific details, refer to the source code, scripts, and README.md in each demo directory.

Comments list
Comments
Log in