Note

This is the documentation for the latest development branch and may refer to features that are not available in released versions. If you are looking for the documentation for a specific release, use the drop-down menu on the left and select the desired version.

Single-Model Application Development Guide#

Overview#

This guide explains how to develop a single-model AI application on K230 and uses face detection as the reference example.

Development Guide#

Convert kmodel#

Before writing deployment code, prepare a kmodel.

For the face-detection reference application, you can directly reuse:

  • face_detection_320.kmodel

  • face_detection_640.kmodel

from:

src/rtsmart/examples/ai/face_detection/utils

If you train your own model, the usual conversion flow is:

  1. train the model in a framework such as PyTorch

  2. export it to onnx or tflite

  3. convert it to kmodel

For model conversion details, refer to the nncase guide and API reference.

Develop Deployment Code#

Modules and Flow#

Involved Modules:

  1. vicap (video input capture): Configures camera sensor properties and channel attributes including resolution, frame rate, and data format. Implements binding camera data to display and provides AI inference frame data.

  2. vo (video output): Configures display device and layer attributes including position, resolution, frame rate, and data format. Displays camera frames or other input in real-time through video and OSD layers. The video layer supports YUV format only, while the OSD layer supports RGB format only.

  3. kpu: Loads kmodel, configures input/output tensors, and performs model inference.

  4. ai2d: Performs preprocessing on model input images with five predefined preprocessing modes. See usage_ai2d for details.

Processing Flow:

The single-model AI application uses a single-camera dual-channel processing approach:

  • Display Channel: One image stream is directly bound to the screen for real-time, low-latency display.

  • AI Channel: Another image stream is used for AI model inference by converting it to tensors and processing through the model to get detection or recognition results.

After inference completes, results are drawn onto a transparent OSD layer and merged with the live display. Users see the final effect combining original image with AI recognition results.

We use this “dual-channel + layer merging” approach to solve performance bottlenecks. With a traditional pipeline:

Capture image → Create tensor → Preprocess → Inference → Postprocess → Draw results → Display

If model inference takes significant time, the entire pipeline causes image stutter, especially with complex models. Therefore, we separate display from AI inference: prioritize live display with async inference result drawing and merging, ensuring smooth display while presenting AI analysis results in real-time.

Code Structure#

Using face-detection as the reference example, the code structure is:

face_detection
├── cmake
├── src
│    ├── ai_base.cc         # Model inference wrapper implementation
│    ├── ai_base.h          # Model inference header file
│    ├── ai_utils.cc        # Utility methods for model inference
│    ├── ai_utils.h         # Utility methods header file
│    ├── anchors_320.cc     # Anchors for 320-input face detection model
│    ├── anchors_640.cc     # Anchors for 640-input face detection model
│    ├── face_detection.cc  # Task-specific implementation: preprocess, inference, postprocess, drawing
│    ├── face_detection.h   # Task-specific header file
│    ├── main.cc            # Main function: orchestrates full AI application
│    ├── scoped_timing.h    # Timing utility for debugging
│    ├── setting.h          # Configuration macros for display and AI frame resolution
│    ├── video_pipeline.cc  # Single-camera dual-channel implementation
│    ├── video_pipeline.h   # Video pipeline header file
│    └── CMakeLists.txt     # CMakeLists for this task
├── utils                   # Pre-built kmodel and scripts
├── CMakeLists.txt          # Root CMakeLists
├── build_app.sh            # Compilation script
└── Makefile                # Alternative build method

Code Responsibilities#

Each file has a specific role:

File

Purpose

ai_base.h

Provides inference interfaces for model processing

ai_base.cc

Implements the model inference methods defined in ai_base.h

ai_utils.h

Provides common utility function interfaces

ai_utils.cc

Implements utility functions defined in ai_utils.h

scoped_timing.h

Provides timing utilities for development debugging

setting.h

Provides configuration interfaces for display device parameters and AI frame resolution

video_pipeline.h

Provides interfaces for dual-channel pipeline: camera, display device, OSD layer initialization and frame operations

video_pipeline.cc

Implements video processing interfaces defined in video_pipeline.h

face_detection.h

Provides task-specific interfaces: preprocessing, inference, postprocessing, result drawing

face_detection.cc

Implements task-specific interfaces defined in face_detection.h

anchors_320.cc

Anchor data for 320-input face detection model

anchors_640.cc

Anchor data for 640-input face detection model

main.cc

Main function: drives complete application flow

How to Use and Modify:

  • ai_base.h and ai_base.cc implement the model inference wrapper base class with kmodel initialization, input/output initialization, execution, and output retrieval interfaces. See code comments for details. These files typically do not require changes.

  • ai_utils.h and ai_utils.cc provide common utility functions for data access and preprocessing. If provided functions are insufficient, modify these files to add new methods; otherwise leave them unchanged.

  • setting.h, video_pipeline.h, and video_pipeline.cc implement camera, display, and OSD initialization and configuration. Support both LT9611 HDMI 1920×1080 and ST7701 LCD 800×480 display modes. Only modify if adding new display support; otherwise keep unchanged.

  • face_detection.h, face_detection.cc, and main.cc are files you must write yourself when developing new AI applications. Reference the corresponding files in src/rtsmart/examples/ai/face_detection. The task-specific header and source files implement input preprocessing, inference (usually calls ai_base.h run method), and output postprocessing. The main.cc file must orchestrate task class instantiation, preprocessing, inference, postprocessing, and result drawing.

Code Details#

setting.h Configuration#

The macros in setting.h are mainly used to configure camera output, display output, OSD, and the AI frame resolution.

Macro

Description

ISP_WIDTH

ISP output width

ISP_HEIGHT

ISP output height

DISPLAY_MODE

0: 1920 x 1080 LT9611, 1: 800 x 480 ST7701

DISPLAY_WIDTH

Display width

DISPLAY_HEIGHT

Display height

DISPLAY_ROTATE

0: no rotation, 1: rotate 90 degrees

AI_FRAME_WIDTH

AI frame width

AI_FRAME_HEIGHT

AI frame height

AI_FRAME_CHANNEL

AI frame channels

USE_OSD

0: disable OSD, 1: enable OSD

OSD_WIDTH

OSD layer width

OSD_HEIGHT

OSD layer height

OSD_CHANNEL

OSD layer channels

Detailed Configuration:

#define ISP_WIDTH 1920
#define ISP_HEIGHT 1080

This is the camera configuration resolution. Based on this, the stream is split into display and AI channels with different formats and resolutions as configured.

#define DISPLAY_MODE 1    // 0: 1920×1080 LT9611, 1: 800×480 ST7701
#define DISPLAY_WIDTH 800
#define DISPLAY_HEIGHT 480
#define DISPLAY_ROTATE 1  // 0: no rotation, 1: rotate 90 degrees

This is the display channel configuration split from the camera. Display path is affected by screen resolution and portrait/landscape orientation. Typically HDMI 1080P can keep the current LT9611 settings unchanged. ST7701 screen (resolution 800×480) is also supported.

The ST7701 panel is physically 480×800 (portrait), requiring 90-degree rotation for display. This rotation functionality is now encapsulated in the lower VO layer, so users can ignore it and treat it as a landscape display.

#define AI_FRAME_WIDTH 640
#define AI_FRAME_HEIGHT 360
#define AI_FRAME_CHANNEL 3

This is the AI channel configuration split from camera for model preprocessing. Configure based on your AI requirements. Output format is 3×360×640 in PIXEL_FORMAT_RGB_888_PLANAR format with CHW data layout.

Important Note

Distinguish between AI channel resolution and model input resolution:

  • AI channel resolution: Camera data resolution before model preprocessing

  • Model input resolution: Data resolution after preprocessing fed directly to the model

Preprocessing converts AI channel data to model input data. For example, if camera AI channel outputs 640×360 but model requires 320×320, preprocessing must transform it accordingly.

#define USE_OSD 1
#define OSD_WIDTH 800
#define OSD_HEIGHT 480
#define OSD_CHANNEL 4

This is the OSD drawing channel configuration. Resolution must match display resolution. The OSD frame is a transparent BGRA8888 image with no original image, only detection boxes. This OSD path is merged with the display path for the final effect. Detection boxes, keypoints, and other information are drawn on the OSD frame, then inserted into the display channel to achieve merged display.

AIBase Class Details#

AIBase is the wrapper class for model inference, encapsulating nncase operations. It covers model loading, input/output tensor initialization, inference execution, and output retrieval. Most demo development only needs to focus on model preprocessing and postprocessing:

/**
 * @brief AI base class encapsulating nncase operations
 * Mainly encapsulates nncase load, input setup, execution, and output retrieval.
 * Subsequent demos only need to focus on model preprocessing and postprocessing.
 */
class AIBase
{
public:
    /**
     * @brief AI base class constructor, loads kmodel and initializes input/output
     * @param kmodel_file kmodel file path
     * @param model_name model name for identification
     * @param debug_mode 0 (no debug), 1 (timing only), 2 (all debug output)
     * @return None
     */
    AIBase(const char *kmodel_file, const string model_name, const int debug_mode = 1);

    /**
     * @brief AI base class destructor
     * @return None
     */
    ~AIBase();

    /**
     * @brief Get kmodel input tensor by index
     * @param idx input tensor index
     * @return input tensor
     */
    runtime_tensor get_input_tensor(size_t idx);

    void set_input_tensor(size_t idx, runtime_tensor &input_tensor);

    /**
     * @brief Run model inference on kmodel
     * @return None
     */
    void run();

    /**
     * @brief Get kmodel output, results stored in class attributes
     * @return None
     */
    void get_output();

    runtime_tensor get_output_tensor(int idx);

protected:
    string model_name_;                 // Model name
    int debug_mode_;                    // Debug mode: 0 (no print), 1 (timing), 2 (all)
    vector<float *> p_outputs_;         // Output tensor pointer list for kmodel
    vector<vector<int>> input_shapes_;  // {{N,C,H,W},{N,C,H,W}...}
    vector<vector<int>> output_shapes_; // {{N,C,H,W},...} or {{N,C},...} etc
private:
    /**
     * @brief Initialize kmodel inputs on first run, get input shapes
     * @return None
     */
    void set_input_init();

    /**
     * @brief Initialize kmodel outputs on first run, get output shapes
     * @return None
     */
    void set_output_init();

    interpreter kmodel_interp_;         // kmodel interpreter for loading and inference
    vector<unsigned char> kmodel_vec_;  // kmodel data read from file for interpreter
};

When developing applications, the primarily useful members are:

  • input_shapes_ for getting input tensor shapes

  • output_shapes_ for getting output tensor shapes

  • p_outputs_ for getting output tensor data pointers

Example accessing first output:

float *output0 = p_outputs_[0];

Task Header and Source Files#

face_detection.h and face_detection.cc are core files you must implement when developing new applications.

In real projects, you can name them based on your scenario:

***.h
***.cc

Examples: person_det.h, helmet_detect.cc, gesture_recog.h, etc.

You must implement a task class that:

class YourTask : public AIBase

That is — inherit AIBase and complete the task logic.

The task class handles 4 responsibilities:

Component

Required

Purpose

Preprocess

✅ Must implement

Convert input image to model format

Inference

✅ Direct call to AIBase

Already encapsulated by AIBase

Postprocess

✅ Must implement

Convert model output to interpretable results

Draw

✅ Must implement

Draw results onto image

For example, if implementing myapp.h and myapp.cc:

#ifndef _MYAPP_H
#define _MYAPP_H

#include <iostream>
#include <vector>
#include "ai_utils.h"
#include "ai_base.h"

using std::vector;

/**
 * @brief Custom data structure used in postprocessing
 * For example, detection boxes need coordinates (xywh), class index, confidence
 */
typedef struct ExampleResults
{
    // Define data structure as needed
} ExampleResults;

/**
 * @brief Application class to develop, inherits AIBase
 * Encapsulates for each frame: preprocessing, execution, postprocessing workflow
 */
class MyApp : public AIBase
{
public:
    /**
     * @brief Video stream inference, MyApp constructor
     * Loads kmodel, initializes model input/output and parameters (thresholds, etc)
     * Configures preprocessing
     * @param kmodel_file kmodel file path
     * @param other_params other parameters such as thresholds
     * @param image_size camera AI channel input frame shape
     * @param debug_mode 0 (no debug), 1 (timing only), 2 (all debug output)
     * @return None
     */
    MyApp(char *kmodel_file, other_params, FrameCHWSize image_size, int debug_mode);

    /**
     * @brief MyApp destructor
     * @return None
     */
    ~MyApp();

    /**
     * @brief Preprocessing
     * @param input_tensor input tensor
     * @return None
     */
    void pre_process(runtime_tensor &input_tensor);

    /**
     * @brief kmodel inference
     * @return None
     */
    void inference();

    /**
     * @brief kmodel postprocessing
     * Uses input image_size to restore coordinates to original resolution
     * Stores results in results container
     * @param image_size input image shape
     * @param results postprocessing results container
     * @return None
     */
    void post_process(FrameCHWSize image_size, vector<ExampleResults> &results);

    /**
     * @brief Draw results
     * @param draw_frame transparent image for drawing (video OSD) or original (image inference)
     * @param results postprocessing results
     * @return None
     */
    void draw_result(cv::Mat &draw_frame, vector<ExampleResults> &results);

    std::unique_ptr<ai2d_builder> ai2d_builder_;  // ai2d builder
    runtime_tensor ai2d_out_tensor_;              // ai2d output tensor
    FrameCHWSize image_size_;                     // input image shape
    FrameCHWSize input_size_;                     // model input shape

    // Define other members for current task as needed
    // ***
};

#endif

Implement the interfaces in myapp.cc following the reference implementation in:

src/rtsmart/examples/ai/face_detection/src/face_detection.cc

main.cc Changes#

Overview

main.cc contains the complete task logic: getting frame data from camera or loading an image, creating tensors, calling preprocessing, inference, postprocessing, and drawing results. This achieves complete per-frame processing.

The inference flow is illustrated below:

model_inference_rtos

Video Inference Code

Video inference in main.cc follows this pattern (pseudo-code with detailed comments):

FrameCHWSize image_size = {AI_FRAME_CHANNEL, AI_FRAME_HEIGHT, AI_FRAME_WIDTH};
// Create empty Mat for drawing frames
cv::Mat draw_frame(OSD_HEIGHT, OSD_WIDTH, CV_8UC4, cv::Scalar(0, 0, 0, 0));
// Create empty runtime_tensor for input data
runtime_tensor input_tensor;
dims_t in_shape {1, AI_FRAME_CHANNEL, AI_FRAME_HEIGHT, AI_FRAME_WIDTH};
// Create PipeLine for video stream processing
PipeLine pl(debug_mode);
// Initialize PipeLine
pl.Create();
// Create DumpRes for frame data storage
DumpRes dump_res;
// Initialize task class and result storage
MyApp my_app(argv[1], atof(argv[2]), atof(argv[3]), image_size, atoi(argv[5]));
vector<ExampleResults> results;

// Enter while loop, continuously dumping images
while (!isp_stop)
{
    // Create ScopedTiming for total time calculation
    ScopedTiming st("total time", 1);
    // Get one frame from PipeLine and create tensor
    pl.GetFrame(dump_res);
    input_tensor = host_runtime_tensor::create(typecode_t::dt_uint8, in_shape,
        {(gsl::byte *)dump_res.virt_addr, compute_size(in_shape)}, false,
        hrt::pool_shared, dump_res.phy_addr).expect("cannot create input tensor");
    hrt::sync(input_tensor, sync_op_t::sync_write_back, true).expect("sync write_back failed");

    // Preprocessing, inference, postprocessing
    my_app.pre_process(input_tensor);
    my_app.inference();
    my_app.post_process(image_size, results);

    // Clear previous frame drawing
    draw_frame.setTo(cv::Scalar(0, 0, 0, 0));
    my_app.draw_result(draw_frame, results);

    // Insert drawing frame into PipeLine display video stream
    pl.InsertFrame(draw_frame.data);
    // Release current frame data
    pl.ReleaseFrame();
}
pl.Destroy();

Run video inference in a separate thread. Set isp_stop to True when user inputs q to exit.

Image Inference Code

Image inference in main.cc:

int debug_mode = atoi(argv[5]);
// Load image
cv::Mat ori_img = cv::imread(argv[4]);
// Initialize image_size from image
FrameCHWSize image_size = {ori_img.channels(), ori_img.rows, ori_img.cols};
// Create vector for CHW image data, convert read HWC to CHW
std::vector<uint8_t> chw_vec;
std::vector<cv::Mat> bgrChannels(3);
cv::split(ori_img, bgrChannels);
for (auto i = 2; i > -1; i--)
{
    std::vector<uint8_t> data = std::vector<uint8_t>(bgrChannels[i].reshape(1, 1));
    chw_vec.insert(chw_vec.end(), data.begin(), data.end());
}

// Create input tensor
dims_t in_shape {1, 3, ori_img.rows, ori_img.cols};
runtime_tensor input_tensor = host_runtime_tensor::create(typecode_t::dt_uint8, in_shape,
    hrt::pool_shared).expect("cannot create input tensor");
auto input_buf = input_tensor.impl()->to_host().unwrap()->buffer().as_host().unwrap()
    .map(map_access_::map_write).unwrap().buffer();
memcpy(reinterpret_cast<char *>(input_buf.data()), chw_vec.data(), chw_vec.size());
hrt::sync(input_tensor, sync_op_t::sync_write_back, true).expect("write back input failed");

// Initialize task class and result storage
MyApp my_app(argv[1], atof(argv[2]), atof(argv[3]), image_size, atoi(argv[5]));
vector<ExampleResults> results;

// Preprocessing, inference, postprocessing
my_app.pre_process(input_tensor);
my_app.inference();
my_app.post_process(image_size, results);

// Draw directly on original image
my_app.draw_result(ori_img, results);
cv::imwrite("result.jpg", ori_img);

When changing inference logic, also update:

void print_usage(const char *name)
{
    cout << "Usage: " << name << "<kmodel_det> <obj_thres> <nms_thres> <input_mode> <debug_mode>" << endl
         << "Options:" << endl
         << "  kmodel_det      Face detection kmodel path\n"
         << "  other_params    Other parameters such as thresholds\n"
         << "  input_mode      Local image (image path) / Camera (None)\n"
         << "  debug_mode      Debug mode: 0 (off), 1 (simple), 2 (detailed)\n"
         << endl;
}

And argument validation:

// Argument count validation
std::cout << "case " << argv[0] << " built at " << __DATE__ << " " << __TIME__ << std::endl;
if (argc != 5)
{
    print_usage(argv[0]);
    return -1;
}

CMakeLists.txt and Build Script#

For the root src/CMakeLists.txt, add subdirectories:

add_subdirectory(src)

For the task subdirectory face_detection/src/CMakeLists.txt, update compiled sources and executable name:

set(src main.cc face_detection.cc anchors_320.cc anchors_640.cc ai_base.cc ai_utils.cc video_pipeline.cc)
set(bin face_detection.elf)

The compilation script face_detection/build_app.sh defines build environment variables and copies the generated elf and kmodel files to k230_bin directory:

# Copy generated elf and kmodel to k230_bin directory
collect_outputs() {
    local elf_file="${BUILD_DIR}/bin/face_detection.elf"

    if [ -f "${elf_file}" ]; then
        echo "[INFO] Collecting ELF and utility files to ${K230_BIN_DIR}..."
        cp -u "${elf_file}" "${K230_BIN_DIR}/"
        cp -u utils/* "${K230_BIN_DIR}/" 2>/dev/null || true
    else
        echo "[WARN] ELF file not found: ${elf_file}"
    fi
}

Build#

Select Board and Build Firmware#

From the RTOS SDK root, check supported boards:

make list-def

Switch to your board and compile:

make ***_defconfig
make -j

After completion, firmware image is generated in output directory. Applications should be placed in src/rtsmart/examples/ai following the face_detection reference structure.

Build Method 1: Using build_app.sh#

After finishing code changes, enter the application directory and run:

./build_app.sh

Build intermediates are placed in build, and deployment files are collected in k230_bin.

Build Method 2: Using menuconfig#

From RTOS SDK root, run make menuconfig and enable:

RT-Smart UserSpace Examples Configuration
-> Enable build ai examples
-> Enable Build Face Detection Programs

Save and exit. Since Makefile is provided, run:

make -j

Deployment files are compiled directly into firmware at:

/sdcard/app/examples/ai/face_detection

Alternatively, enter the application directory and run:

make -j

This also supports incremental builds and generates outputs in k230_bin directory.

Board Deployment#

Flash the firmware to the board and power on. See:

how_to_flash

After boot, a virtual disk CanMV appears. Copy the compiled elf file, kmodel file, and other files from k230_bin to:

CanMV/sdcard

Connect to the board via serial terminal and run:

  • face_detect_isp.sh for camera inference

  • face_detect_image.sh for image inference

Ensure parameters match the code and deployment layout.

Debugging Guide#

Check Model Input and Output Shapes#

Print input_shapes_ and output_shapes_ from AIBase class attributes to verify model I/O dimensions are correct.

Dump Raw Data#

ai_utils.h provides dump_binary_file, dump_gray_image, and dump_color_image interfaces for dumping binary files, grayscale images, and color images. Verify data layout and channel order through dumped images. Note: BGR and RGB data are different.

Locate Failures with Debug Prints#

Add std::cout statements or logging mechanism in code, rebuild and redeploy to locate failure points.

Add Timing Statistics#

For abnormal overall runtime, add timing statements to check module execution time. The source provides scoped_timing.h for timing:

{
    ScopedTiming st("test", 1);
    /*
     * code under test
     */
}

Check Memory Usage#

Memory usage information is available under RTOS via /proc. For multimedia memory:

cat /proc/media-mem

Other modules:

cat /proc/umap/vicap
cat /proc/umap/vb
cat /proc/umap/vo

For system memory usage:

list_page

Shows free pages, used pages, and peak usage (in hex). Each page is 4KB. Total available memory = free pages + used pages.

If Model Inference Quality Is Unsatisfactory#

If inference quality doesn’t meet requirements after verifying the code path, optimize from these aspects:

  • Adjust model parameters such as confidence threshold, NMS threshold

  • Adjust model input resolution, verify preprocessing correctness (e.g., adjust from 320×320 to 640×640)

  • Adjust model conversion quantization parameters, see Quantization Parameters for calibrate_method, quant_type, and w_quant_type (e.g., change w_quant_type to int6)

  • Try alternative models for the same task if current model doesn’t meet requirements

Comments list
Comments
Log in