# Single-Model Application Development Guide

## Overview

This guide explains how to develop a **single-model AI application** on K230 and uses **face detection** as the reference example.

## Development Guide

### Convert `kmodel`

Before writing deployment code, prepare a `kmodel`.

For the face-detection reference application, you can directly reuse:

- `face_detection_320.kmodel`
- `face_detection_640.kmodel`

from:

```bash
src/rtsmart/examples/ai/face_detection/utils
```

If you train your own model, the usual conversion flow is:

1. train the model in a framework such as PyTorch
1. export it to `onnx` or `tflite`
1. convert it to `kmodel`

For model conversion details, refer to the nncase guide and API reference.

### Develop Deployment Code

#### Modules and Flow

**Involved Modules:**

1. **vicap (video input capture)**: Configures camera sensor properties and channel attributes including resolution, frame rate, and data format. Implements binding camera data to display and provides AI inference frame data.

1. **vo (video output)**: Configures display device and layer attributes including position, resolution, frame rate, and data format. Displays camera frames or other input in real-time through video and OSD layers. The video layer supports YUV format only, while the OSD layer supports RGB format only.

1. **kpu**: Loads `kmodel`, configures input/output tensors, and performs model inference.

1. **ai2d**: Performs preprocessing on model input images with five predefined preprocessing modes. See [usage_ai2d](./usage_ai2d.md) for details.

**Processing Flow:**

The single-model AI application uses a **single-camera dual-channel** processing approach:

- **Display Channel**: One image stream is directly bound to the screen for **real-time, low-latency display**.
- **AI Channel**: Another image stream is used for **AI model inference** by converting it to tensors and processing through the model to get detection or recognition results.

After inference completes, results are drawn onto a **transparent OSD layer** and **merged with the live display**. Users see the **final effect combining original image with AI recognition results**.

We use this "dual-channel + layer merging" approach to solve performance bottlenecks. With a traditional pipeline:

```text
Capture image → Create tensor → Preprocess → Inference → Postprocess → Draw results → Display
```

If **model inference takes significant time**, the entire pipeline causes image stutter, especially with complex models. Therefore, we separate display from AI inference: **prioritize live display with async inference result drawing and merging**, ensuring smooth display while presenting AI analysis results in real-time.

#### Code Structure

Using face-detection as the reference example, the code structure is:

```text
face_detection
├── cmake
├── src
│    ├── ai_base.cc         # Model inference wrapper implementation
│    ├── ai_base.h          # Model inference header file
│    ├── ai_utils.cc        # Utility methods for model inference
│    ├── ai_utils.h         # Utility methods header file
│    ├── anchors_320.cc     # Anchors for 320-input face detection model
│    ├── anchors_640.cc     # Anchors for 640-input face detection model
│    ├── face_detection.cc  # Task-specific implementation: preprocess, inference, postprocess, drawing
│    ├── face_detection.h   # Task-specific header file
│    ├── main.cc            # Main function: orchestrates full AI application
│    ├── scoped_timing.h    # Timing utility for debugging
│    ├── setting.h          # Configuration macros for display and AI frame resolution
│    ├── video_pipeline.cc  # Single-camera dual-channel implementation
│    ├── video_pipeline.h   # Video pipeline header file
│    └── CMakeLists.txt     # CMakeLists for this task
├── utils                   # Pre-built kmodel and scripts
├── CMakeLists.txt          # Root CMakeLists
├── build_app.sh            # Compilation script
└── Makefile                # Alternative build method
```

#### Code Responsibilities

Each file has a specific role:

| File | Purpose |
|------|---------|
| `ai_base.h` | Provides inference interfaces for model processing |
| `ai_base.cc` | Implements the model inference methods defined in `ai_base.h` |
| `ai_utils.h` | Provides common utility function interfaces |
| `ai_utils.cc` | Implements utility functions defined in `ai_utils.h` |
| `scoped_timing.h` | Provides timing utilities for development debugging |
| `setting.h` | Provides configuration interfaces for display device parameters and AI frame resolution |
| `video_pipeline.h` | Provides interfaces for dual-channel pipeline: camera, display device, OSD layer initialization and frame operations |
| `video_pipeline.cc` | Implements video processing interfaces defined in `video_pipeline.h` |
| `face_detection.h` | Provides task-specific interfaces: preprocessing, inference, postprocessing, result drawing |
| `face_detection.cc` | Implements task-specific interfaces defined in `face_detection.h` |
| `anchors_320.cc` | Anchor data for 320-input face detection model |
| `anchors_640.cc` | Anchor data for 640-input face detection model |
| `main.cc` | Main function: drives complete application flow |

**How to Use and Modify:**

- `ai_base.h` and `ai_base.cc` implement the model inference wrapper base class with `kmodel` initialization, input/output initialization, execution, and output retrieval interfaces. See code comments for details. **These files typically do not require changes**.

- `ai_utils.h` and `ai_utils.cc` provide common utility functions for data access and preprocessing. **If provided functions are insufficient, modify these files to add new methods; otherwise leave them unchanged**.

- `setting.h`, `video_pipeline.h`, and `video_pipeline.cc` implement camera, display, and OSD initialization and configuration. Support both `LT9611 HDMI 1920×1080` and `ST7701 LCD 800×480` display modes. **Only modify if adding new display support; otherwise keep unchanged**.

- `face_detection.h`, `face_detection.cc`, and `main.cc` are **files you must write yourself** when developing new AI applications. Reference the corresponding files in `src/rtsmart/examples/ai/face_detection`. The task-specific header and source files implement **input preprocessing, inference (usually calls `ai_base.h` run method), and output postprocessing**. The `main.cc` file must orchestrate **task class instantiation, preprocessing, inference, postprocessing, and result drawing**.

### Code Details

#### `setting.h` Configuration

The macros in `setting.h` are mainly used to configure camera output, display output, OSD, and the AI frame resolution.

| Macro | Description |
| --- | --- |
| `ISP_WIDTH` | ISP output width |
| `ISP_HEIGHT` | ISP output height |
| `DISPLAY_MODE` | `0`: `1920 x 1080` LT9611, `1`: `800 x 480` ST7701 |
| `DISPLAY_WIDTH` | Display width |
| `DISPLAY_HEIGHT` | Display height |
| `DISPLAY_ROTATE` | `0`: no rotation, `1`: rotate 90 degrees |
| `AI_FRAME_WIDTH` | AI frame width |
| `AI_FRAME_HEIGHT` | AI frame height |
| `AI_FRAME_CHANNEL` | AI frame channels |
| `USE_OSD` | `0`: disable OSD, `1`: enable OSD |
| `OSD_WIDTH` | OSD layer width |
| `OSD_HEIGHT` | OSD layer height |
| `OSD_CHANNEL` | OSD layer channels |

**Detailed Configuration:**

```cpp
#define ISP_WIDTH 1920
#define ISP_HEIGHT 1080
```

This is the camera configuration resolution. Based on this, the stream is split into display and AI channels with different formats and resolutions as configured.

```cpp
#define DISPLAY_MODE 1    // 0: 1920×1080 LT9611, 1: 800×480 ST7701
#define DISPLAY_WIDTH 800
#define DISPLAY_HEIGHT 480
#define DISPLAY_ROTATE 1  // 0: no rotation, 1: rotate 90 degrees
```

This is the display channel configuration split from the camera. Display path is affected by screen resolution and portrait/landscape orientation. Typically `HDMI 1080P` can keep the current `LT9611` settings unchanged. `ST7701` screen (resolution `800×480`) is also supported.

The `ST7701` panel is physically `480×800` (portrait), requiring 90-degree rotation for display. This rotation functionality is now encapsulated in the lower VO layer, so users can ignore it and treat it as a landscape display.

```cpp
#define AI_FRAME_WIDTH 640
#define AI_FRAME_HEIGHT 360
#define AI_FRAME_CHANNEL 3
```

This is the AI channel configuration split from camera for model preprocessing. Configure based on your AI requirements. Output format is `3×360×640` in `PIXEL_FORMAT_RGB_888_PLANAR` format with `CHW` data layout.

> **Important Note**
>
> Distinguish between AI channel resolution and model input resolution:
>
> - **AI channel resolution**: Camera data resolution before model preprocessing
> - **Model input resolution**: Data resolution after preprocessing fed directly to the model
>
> Preprocessing converts AI channel data to model input data. For example, if camera AI channel outputs `640×360` but model requires `320×320`, preprocessing must transform it accordingly.

```cpp
#define USE_OSD 1
#define OSD_WIDTH 800
#define OSD_HEIGHT 480
#define OSD_CHANNEL 4
```

This is the OSD drawing channel configuration. Resolution must match display resolution. The OSD frame is a transparent `BGRA8888` image with no original image, only detection boxes. This OSD path is merged with the display path for the final effect. Detection boxes, keypoints, and other information are drawn on the OSD frame, then inserted into the display channel to achieve merged display.

#### `AIBase` Class Details

`AIBase` is the wrapper class for model inference, encapsulating nncase operations. It covers model loading, input/output tensor initialization, inference execution, and output retrieval. Most demo development only needs to focus on model preprocessing and postprocessing:

```cpp
/**
 * @brief AI base class encapsulating nncase operations
 * Mainly encapsulates nncase load, input setup, execution, and output retrieval.
 * Subsequent demos only need to focus on model preprocessing and postprocessing.
 */
class AIBase
{
public:
    /**
     * @brief AI base class constructor, loads kmodel and initializes input/output
     * @param kmodel_file kmodel file path
     * @param model_name model name for identification
     * @param debug_mode 0 (no debug), 1 (timing only), 2 (all debug output)
     * @return None
     */
    AIBase(const char *kmodel_file, const string model_name, const int debug_mode = 1);

    /**
     * @brief AI base class destructor
     * @return None
     */
    ~AIBase();

    /**
     * @brief Get kmodel input tensor by index
     * @param idx input tensor index
     * @return input tensor
     */
    runtime_tensor get_input_tensor(size_t idx);

    void set_input_tensor(size_t idx, runtime_tensor &input_tensor);

    /**
     * @brief Run model inference on kmodel
     * @return None
     */
    void run();

    /**
     * @brief Get kmodel output, results stored in class attributes
     * @return None
     */
    void get_output();

    runtime_tensor get_output_tensor(int idx);

protected:
    string model_name_;                 // Model name
    int debug_mode_;                    // Debug mode: 0 (no print), 1 (timing), 2 (all)
    vector<float *> p_outputs_;         // Output tensor pointer list for kmodel
    vector<vector<int>> input_shapes_;  // {{N,C,H,W},{N,C,H,W}...}
    vector<vector<int>> output_shapes_; // {{N,C,H,W},...} or {{N,C},...} etc
private:
    /**
     * @brief Initialize kmodel inputs on first run, get input shapes
     * @return None
     */
    void set_input_init();

    /**
     * @brief Initialize kmodel outputs on first run, get output shapes
     * @return None
     */
    void set_output_init();

    interpreter kmodel_interp_;         // kmodel interpreter for loading and inference
    vector<unsigned char> kmodel_vec_;  // kmodel data read from file for interpreter
};
```

When developing applications, the primarily useful members are:

- `input_shapes_` for getting input tensor shapes
- `output_shapes_` for getting output tensor shapes
- `p_outputs_` for getting output tensor data pointers

Example accessing first output:

```cpp
float *output0 = p_outputs_[0];
```

#### Task Header and Source Files

`face_detection.h` and `face_detection.cc` are **core files you must implement** when developing new applications.

In real projects, you can name them based on your scenario:

```bash
***.h
***.cc
```

Examples: `person_det.h`, `helmet_detect.cc`, `gesture_recog.h`, etc.

You must implement a **task class** that:

```cpp
class YourTask : public AIBase
```

That is — **inherit AIBase and complete the task logic**.

The task class handles 4 responsibilities:

| Component | Required | Purpose |
|-----------|----------|---------|
| **Preprocess** | ✅ Must implement | Convert input image to model format |
| **Inference** | ✅ Direct call to AIBase | Already encapsulated by AIBase |
| **Postprocess** | ✅ Must implement | Convert model output to interpretable results |
| **Draw** | ✅ Must implement | Draw results onto image |

For example, if implementing `myapp.h` and `myapp.cc`:

```cpp
#ifndef _MYAPP_H
#define _MYAPP_H

#include <iostream>
#include <vector>
#include "ai_utils.h"
#include "ai_base.h"

using std::vector;

/**
 * @brief Custom data structure used in postprocessing
 * For example, detection boxes need coordinates (xywh), class index, confidence
 */
typedef struct ExampleResults
{
    // Define data structure as needed
} ExampleResults;

/**
 * @brief Application class to develop, inherits AIBase
 * Encapsulates for each frame: preprocessing, execution, postprocessing workflow
 */
class MyApp : public AIBase
{
public:
    /**
     * @brief Video stream inference, MyApp constructor
     * Loads kmodel, initializes model input/output and parameters (thresholds, etc)
     * Configures preprocessing
     * @param kmodel_file kmodel file path
     * @param other_params other parameters such as thresholds
     * @param image_size camera AI channel input frame shape
     * @param debug_mode 0 (no debug), 1 (timing only), 2 (all debug output)
     * @return None
     */
    MyApp(char *kmodel_file, other_params, FrameCHWSize image_size, int debug_mode);

    /**
     * @brief MyApp destructor
     * @return None
     */
    ~MyApp();

    /**
     * @brief Preprocessing
     * @param input_tensor input tensor
     * @return None
     */
    void pre_process(runtime_tensor &input_tensor);

    /**
     * @brief kmodel inference
     * @return None
     */
    void inference();

    /**
     * @brief kmodel postprocessing
     * Uses input image_size to restore coordinates to original resolution
     * Stores results in results container
     * @param image_size input image shape
     * @param results postprocessing results container
     * @return None
     */
    void post_process(FrameCHWSize image_size, vector<ExampleResults> &results);

    /**
     * @brief Draw results
     * @param draw_frame transparent image for drawing (video OSD) or original (image inference)
     * @param results postprocessing results
     * @return None
     */
    void draw_result(cv::Mat &draw_frame, vector<ExampleResults> &results);

    std::unique_ptr<ai2d_builder> ai2d_builder_;  // ai2d builder
    runtime_tensor ai2d_out_tensor_;              // ai2d output tensor
    FrameCHWSize image_size_;                     // input image shape
    FrameCHWSize input_size_;                     // model input shape

    // Define other members for current task as needed
    // ***
};

#endif
```

Implement the interfaces in `myapp.cc` following the reference implementation in:

```bash
src/rtsmart/examples/ai/face_detection/src/face_detection.cc
```

#### `main.cc` Changes

**Overview**

`main.cc` contains the complete task logic: getting frame data from camera or loading an image, creating tensors, calling preprocessing, inference, postprocessing, and drawing results. This achieves complete per-frame processing.

The inference flow is illustrated below:

![model_inference_rtos](https://www.kendryte.com/api/post/attachment?id=642)

**Video Inference Code**

Video inference in `main.cc` follows this pattern (pseudo-code with detailed comments):

```cpp
FrameCHWSize image_size = {AI_FRAME_CHANNEL, AI_FRAME_HEIGHT, AI_FRAME_WIDTH};
// Create empty Mat for drawing frames
cv::Mat draw_frame(OSD_HEIGHT, OSD_WIDTH, CV_8UC4, cv::Scalar(0, 0, 0, 0));
// Create empty runtime_tensor for input data
runtime_tensor input_tensor;
dims_t in_shape {1, AI_FRAME_CHANNEL, AI_FRAME_HEIGHT, AI_FRAME_WIDTH};
// Create PipeLine for video stream processing
PipeLine pl(debug_mode);
// Initialize PipeLine
pl.Create();
// Create DumpRes for frame data storage
DumpRes dump_res;
// Initialize task class and result storage
MyApp my_app(argv[1], atof(argv[2]), atof(argv[3]), image_size, atoi(argv[5]));
vector<ExampleResults> results;

// Enter while loop, continuously dumping images
while (!isp_stop)
{
    // Create ScopedTiming for total time calculation
    ScopedTiming st("total time", 1);
    // Get one frame from PipeLine and create tensor
    pl.GetFrame(dump_res);
    input_tensor = host_runtime_tensor::create(typecode_t::dt_uint8, in_shape,
        {(gsl::byte *)dump_res.virt_addr, compute_size(in_shape)}, false,
        hrt::pool_shared, dump_res.phy_addr).expect("cannot create input tensor");
    hrt::sync(input_tensor, sync_op_t::sync_write_back, true).expect("sync write_back failed");

    // Preprocessing, inference, postprocessing
    my_app.pre_process(input_tensor);
    my_app.inference();
    my_app.post_process(image_size, results);

    // Clear previous frame drawing
    draw_frame.setTo(cv::Scalar(0, 0, 0, 0));
    my_app.draw_result(draw_frame, results);

    // Insert drawing frame into PipeLine display video stream
    pl.InsertFrame(draw_frame.data);
    // Release current frame data
    pl.ReleaseFrame();
}
pl.Destroy();
```

Run video inference in a separate thread. Set `isp_stop` to `True` when user inputs `q` to exit.

**Image Inference Code**

Image inference in `main.cc`:

```cpp
int debug_mode = atoi(argv[5]);
// Load image
cv::Mat ori_img = cv::imread(argv[4]);
// Initialize image_size from image
FrameCHWSize image_size = {ori_img.channels(), ori_img.rows, ori_img.cols};
// Create vector for CHW image data, convert read HWC to CHW
std::vector<uint8_t> chw_vec;
std::vector<cv::Mat> bgrChannels(3);
cv::split(ori_img, bgrChannels);
for (auto i = 2; i > -1; i--)
{
    std::vector<uint8_t> data = std::vector<uint8_t>(bgrChannels[i].reshape(1, 1));
    chw_vec.insert(chw_vec.end(), data.begin(), data.end());
}

// Create input tensor
dims_t in_shape {1, 3, ori_img.rows, ori_img.cols};
runtime_tensor input_tensor = host_runtime_tensor::create(typecode_t::dt_uint8, in_shape,
    hrt::pool_shared).expect("cannot create input tensor");
auto input_buf = input_tensor.impl()->to_host().unwrap()->buffer().as_host().unwrap()
    .map(map_access_::map_write).unwrap().buffer();
memcpy(reinterpret_cast<char *>(input_buf.data()), chw_vec.data(), chw_vec.size());
hrt::sync(input_tensor, sync_op_t::sync_write_back, true).expect("write back input failed");

// Initialize task class and result storage
MyApp my_app(argv[1], atof(argv[2]), atof(argv[3]), image_size, atoi(argv[5]));
vector<ExampleResults> results;

// Preprocessing, inference, postprocessing
my_app.pre_process(input_tensor);
my_app.inference();
my_app.post_process(image_size, results);

// Draw directly on original image
my_app.draw_result(ori_img, results);
cv::imwrite("result.jpg", ori_img);
```

When changing inference logic, also update:

```cpp
void print_usage(const char *name)
{
    cout << "Usage: " << name << "<kmodel_det> <obj_thres> <nms_thres> <input_mode> <debug_mode>" << endl
         << "Options:" << endl
         << "  kmodel_det      Face detection kmodel path\n"
         << "  other_params    Other parameters such as thresholds\n"
         << "  input_mode      Local image (image path) / Camera (None)\n"
         << "  debug_mode      Debug mode: 0 (off), 1 (simple), 2 (detailed)\n"
         << endl;
}
```

And argument validation:

```cpp
// Argument count validation
std::cout << "case " << argv[0] << " built at " << __DATE__ << " " << __TIME__ << std::endl;
if (argc != 5)
{
    print_usage(argv[0]);
    return -1;
}
```

#### `CMakeLists.txt` and Build Script

For the root `src/CMakeLists.txt`, add subdirectories:

```cmake
add_subdirectory(src)
```

For the task subdirectory `face_detection/src/CMakeLists.txt`, update compiled sources and executable name:

```cmake
set(src main.cc face_detection.cc anchors_320.cc anchors_640.cc ai_base.cc ai_utils.cc video_pipeline.cc)
set(bin face_detection.elf)
```

The compilation script `face_detection/build_app.sh` defines build environment variables and copies the generated `elf` and kmodel files to `k230_bin` directory:

```shell
# Copy generated elf and kmodel to k230_bin directory
collect_outputs() {
    local elf_file="${BUILD_DIR}/bin/face_detection.elf"

    if [ -f "${elf_file}" ]; then
        echo "[INFO] Collecting ELF and utility files to ${K230_BIN_DIR}..."
        cp -u "${elf_file}" "${K230_BIN_DIR}/"
        cp -u utils/* "${K230_BIN_DIR}/" 2>/dev/null || true
    else
        echo "[WARN] ELF file not found: ${elf_file}"
    fi
}
```

## Build

### Select Board and Build Firmware

From the RTOS SDK root, check supported boards:

```bash
make list-def
```

Switch to your board and compile:

```bash
make ***_defconfig
make -j
```

After completion, firmware image is generated in `output` directory. Applications should be placed in `src/rtsmart/examples/ai` following the `face_detection` reference structure.

### Build Method 1: Using build_app.sh

After finishing code changes, enter the application directory and run:

```bash
./build_app.sh
```

Build intermediates are placed in `build`, and deployment files are collected in `k230_bin`.

### Build Method 2: Using menuconfig

From RTOS SDK root, run `make menuconfig` and enable:

```text
RT-Smart UserSpace Examples Configuration
-> Enable build ai examples
-> Enable Build Face Detection Programs
```

Save and exit. Since `Makefile` is provided, run:

```bash
make -j
```

Deployment files are compiled directly into firmware at:

```text
/sdcard/app/examples/ai/face_detection
```

Alternatively, enter the application directory and run:

```bash
make -j
```

This also supports incremental builds and generates outputs in `k230_bin` directory.

## Board Deployment

Flash the firmware to the board and power on. See:

[how_to_flash](../../userguide/how_to_flash.md)

After boot, a virtual disk `CanMV` appears. Copy the compiled `elf` file, `kmodel` file, and other files from `k230_bin` to:

```text
CanMV/sdcard
```

Connect to the board via serial terminal and run:

- `face_detect_isp.sh` for camera inference
- `face_detect_image.sh` for image inference

Ensure parameters match the code and deployment layout.

## Debugging Guide

### Check Model Input and Output Shapes

Print `input_shapes_` and `output_shapes_` from `AIBase` class attributes to verify model I/O dimensions are correct.

### Dump Raw Data

`ai_utils.h` provides `dump_binary_file`, `dump_gray_image`, and `dump_color_image` interfaces for dumping binary files, grayscale images, and color images. Verify data layout and channel order through dumped images. Note: `BGR` and `RGB` data are different.

### Locate Failures with Debug Prints

Add `std::cout` statements or logging mechanism in code, rebuild and redeploy to locate failure points.

### Add Timing Statistics

For abnormal overall runtime, add timing statements to check module execution time. The source provides `scoped_timing.h` for timing:

```cpp
{
    ScopedTiming st("test", 1);
    /*
     * code under test
     */
}
```

### Check Memory Usage

Memory usage information is available under RTOS via `/proc`. For multimedia memory:

```bash
cat /proc/media-mem
```

Other modules:

```bash
cat /proc/umap/vicap
cat /proc/umap/vb
cat /proc/umap/vo
```

For system memory usage:

```bash
list_page
```

Shows free pages, used pages, and peak usage (in hex). Each page is 4KB. Total available memory = free pages + used pages.

### If Model Inference Quality Is Unsatisfactory

If inference quality doesn't meet requirements after verifying the code path, optimize from these aspects:

- Adjust model parameters such as confidence threshold, NMS threshold
- Adjust model input resolution, verify preprocessing correctness (e.g., adjust from `320×320` to `640×640`)
- Adjust model conversion quantization parameters, see [Quantization Parameters](./nncase.md#ptqtensoroptions) for `calibrate_method`, `quant_type`, and `w_quant_type` (e.g., change `w_quant_type` to `int6`)
- Try alternative models for the same task if current model doesn't meet requirements