Note

This is the documentation for the latest development branch and may refer to features that are not available in released versions. If you are looking for the documentation for a specific release, use the drop-down menu on the left and select the desired version.

Triple-Camera AI Application Development Guide#

Overview#

A triple-camera AI application means running AI inference on three camera streams simultaneously on the K230 board. The reference sample uses three GC2093 cameras connected to the three MIPI camera interfaces and runs:

  • face detection on one camera

  • palm detection on one camera

  • YOLO 80-class detection on one camera

Development Guide#

Modules and Task Flow#

Involved Modules:

  1. vicap (video input capture): Configures camera sensor properties and channel attributes including resolution, frame rate, and data format. Implements binding camera data to display and provides AI inference frame data from each camera.

  2. vo (video output): Configures display device and layer attributes including position, resolution, frame rate, and data format. Displays camera frames or other input in real-time through video and OSD layers. The video layer supports YUV format only, while the OSD layer supports RGB format only.

  3. kpu: Loads kmodel, configures input/output tensors, and performs model inference.

  4. ai2d: Performs preprocessing on model input images. See usage_ai2d for details.

Processing Flow:

The three-camera application is managed by PipeLine. Each camera is identified by a sensor ID. Each camera stream uses dual-channel processing:

  • Display Channel: One image stream is bound to the vo video layer for real-time display of the original image

  • AI Channel: Another image stream is sent to AI models for inference (face detection, palm detection, YOLO detection)

Inference results are drawn onto OSD layers and merged with the live display. This sample uses three video layers (one per camera) and three OSD layers to achieve split-screen display with detection results overlaid on each camera feed.

The overall process is shown below:

triple_camera_ai_pipeline

Code Structure#

The sample code is located under src/rtsmart/examples/ai/triple_camera_ai. The three-camera configuration is already wrapped in a simplified form so users can follow the same pattern.

triple_camera_ai
├── cmake
├── src
│    ├── ai_base.cc         # Model inference wrapper implementation
│    ├── ai_base.h          # Model inference header file
│    ├── ai_utils.cc        # Utility methods for model inference
│    ├── ai_utils.h         # Utility methods header file
│    ├── anchors_320.cc     # Anchors for 320-input face detection model
│    ├── face_detection.cc  # Face detection task: preprocess, inference, postprocess, drawing
│    ├── face_detection.h   # Face detection task header file
│    ├── hand_detection.cc  # Palm detection task: preprocess, inference, postprocess, drawing
│    ├── hand_detection.h   # Palm detection task header file
│    ├── yolov8_detect.cc   # YOLO 80-class detection task: preprocess, inference, postprocess, drawing
│    ├── yolov8_detect.h    # YOLO detection task header file
│    ├── main.cc            # Main function: coordinates all three tasks and multi-threaded execution
│    ├── scoped_timing.h    # Timing utility for debugging
│    ├── setting.h          # Configuration macros for display and AI frame resolution
│    ├── video_pipeline.cc  # Three-camera pipeline implementation
│    ├── video_pipeline.h   # Three-camera pipeline header file
│    └── CMakeLists.txt     # Build configuration for this task
├── utils                   # Pre-built kmodel and scripts
├── CMakeLists.txt          # Root CMakeLists
├── build_app.sh            # Compilation script
└── Makefile                # Alternative build method

Code Responsibilities#

The main file responsibilities are:

File

Description

ai_base.h

Declares the common model-inference interfaces

ai_base.cc

Implements the common model-inference interfaces

ai_utils.h

Declares shared helper functions

ai_utils.cc

Implements shared helper functions

scoped_timing.h

Provides timing helpers for performance debugging

setting.h

Defines display and AI-frame configuration macros

video_pipeline.h

Declares the three-camera pipeline interface

video_pipeline.cc

Implements the three-camera pipeline operations

face_detection.h/.cc

Implements face detection: preprocess, inference, postprocess, and drawing

hand_detection.h/.cc

Implements palm detection task with same workflow

yolov8_detect.h/.cc

Implements YOLO 80-class detection task with same workflow

anchors_320.cc

Provides anchor data for the face-detection model

main.cc

Coordinates the entire application with multi-threaded execution

How to Use and Modify:

  • ai_base.* and scoped_timing.h - Implement the model inference wrapper and timing tools. These files typically do not require changes and are reused across projects.

  • ai_utils.* - Provides common utility functions for data access and preprocessing. Extend these files only if the existing helpers are insufficient for your task requirements.

  • setting.h and video_pipeline.* - Handle camera initialization, display device configuration, frame dump, and OSD insertion. They are already configured for three-camera dual-channel processing. Only modify if you need to:

    • Add a new display type or resolution

    • Modify the number of camera channels or their layout

    • Change frame dimensions or formats

  • face_detection.*, hand_detection.*, yolov8_detect.*, and main.cc - These are the files you focus on when developing your application. Users typically:

    • Modify these task-specific files to implement preprocessing, inference, postprocessing, and drawing for your specific models

    • Replace the reference tasks (face detection, palm detection, YOLO) with your own application logic

    • Update main.cc to orchestrate your own task classes

    • Handle multi-threaded synchronization for KPU access (KPU is exclusive and requires mutex locks for concurrent inference)

Code Details#

setting.h Configuration#

The macros in setting.h configure camera output, display output, OSD size, and AI-frame resolution.

Macro

Description

ISP_WIDTH

ISP output width

ISP_HEIGHT

ISP output height

DISPLAY_MODE

0: 1920 x 1080 LT9611, 1: 800 x 480 ST7701

DISPLAY_WIDTH

Display width

DISPLAY_HEIGHT

Display height

DISPLAY_ROTATE

0: no rotation, 1: rotate 90 degrees

AI_FRAME_WIDTH

AI frame width

AI_FRAME_HEIGHT

AI frame height

AI_FRAME_CHANNEL

AI frame channels

USE_OSD

Whether to enable OSD

OSD_WIDTH

OSD width

OSD_HEIGHT

OSD height

OSD_CHANNEL

OSD channels

Typical fragments are:

#define ISP_WIDTH 1920
#define ISP_HEIGHT 1080

This is the source camera resolution. It is then split into a display branch and an AI branch.

#define DISPLAY_MODE 1
#define DISPLAY_WIDTH 400
#define DISPLAY_HEIGHT 240
#define DISPLAY_ROTATE 1

For the triple-camera sample, 800 x 480 is split into 400 x 240 windows because multiple camera images are shown on one screen at the same time.

ST7701 is physically 480 x 800 and requires 90 degree rotation. That rotation is already wrapped in the lower vo implementation, so the application can treat it as a landscape display.

#define AI_FRAME_WIDTH 640
#define AI_FRAME_HEIGHT 360
#define AI_FRAME_CHANNEL 3

This is the AI branch before preprocessing. The sample uses PIXEL_FORMAT_RGB_888_PLANAR in CHW layout.

Note: The AI-frame resolution and the model-input resolution are not the same thing. The AI frame is the camera output before preprocessing. The model input is the tensor size after preprocessing. For example, the camera may output 640 x 360, while the model may require 320 x 320.

#define USE_OSD 1
#define OSD_WIDTH 400
#define OSD_HEIGHT 240
#define OSD_CHANNEL 4

The OSD size must match the split display region. The OSD frame contains only drawn AI results, not the original image. The final visible output is produced by overlaying the OSD layer on the corresponding display layer.

When configuring the OSD layer, use its x and y position to place each camera result in the intended split-screen location.

The display layout is illustrated here:

triple_camera_display

AIBase Notes#

AIBase in ai_base.h is the common wrapper class for model inference. It covers model initialization, input/output shape query, tensor initialization, KPU execution, and output retrieval.

/**
 * @brief AI base class, wraps nncase-related operations.
 * Later application development mainly needs to focus on preprocess and postprocess.
 */
class AIBase
{
public:
    /**
     * @brief Constructor. Loads the kmodel and initializes model inputs and outputs.
     * @param kmodel_file Path to the kmodel file
     * @param model_name  Model name
     * @param debug_mode  0: no debug, 1: timing only, 2: full debug logs
     */
    AIBase(const char *kmodel_file, const string model_name, const int debug_mode = 1);

    /**
     * @brief Destructor.
     */
    ~AIBase();

    runtime_tensor get_input_tensor(size_t idx);
    void set_input_tensor(size_t idx, runtime_tensor &input_tensor);

    /**
     * @brief Run kmodel inference.
     */
    void run();

    /**
     * @brief Get the kmodel outputs and store them in class members.
     */
    void get_output();

    runtime_tensor get_output_tensor(int idx);

protected:
    string model_name_;
    int debug_mode_;
    vector<float *> p_outputs_;
    vector<vector<int>> input_shapes_;
    vector<vector<int>> output_shapes_;

private:
    void set_input_init();
    void set_output_init();

    interpreter kmodel_interp_;
    vector<unsigned char> kmodel_vec_;
};

In application development, the most frequently reused members are the input/output tensor shapes and the raw output pointers:

  • input_shapes_

  • output_shapes_

  • p_outputs_

For example, if you need the pointer to the first model output:

float *output0 = p_outputs_[0];

Task Files#

face_detection.h/.cc, hand_detection.h/.cc, and yolov8_detect.h/.cc are the core files users usually rewrite for their own applications.

In a real project, you can rename these task files to match your own scenario, for example person_det.h, helmet_detect.cc, or gesture_recog.h. Each task should define a task class that inherits from AIBase:

class YourTask : public AIBase

That means you reuse the common inference wrapper from AIBase and implement the task-specific logic yourself.

The task class is mainly responsible for four parts:

Module

Need to implement

Description

Preprocess

Yes

Convert the input image to the model format

Inference

Reuse AIBase

The common run path is already wrapped

Postprocess

Yes

Convert raw model outputs to usable results

Draw

Yes

Draw the results on OSD or on the image

Assume the new task files are myapp.h and myapp.cc. A simplified header can follow the same structure as the existing sample tasks:

typedef struct ExampleResults
{
    // Define the task-specific result structure here.
} ExampleResults;

class MyApp : public AIBase
{
public:
    /**
     * @brief Constructor for video inference.
     * Loads the kmodel, initializes model inputs and outputs, and configures
     * application-specific parameters such as thresholds and preprocess behavior.
     */
    MyApp(char *kmodel_file, other_params, FrameCHWSize image_size, int debug_mode);

    ~MyApp();

    void pre_process(runtime_tensor &input_tensor);
    void inference();
    void post_process(FrameCHWSize image_size, vector<ExampleResults> &results);
    void draw_result(cv::Mat &draw_frame, vector<ExampleResults> &results);

    std::unique_ptr<ai2d_builder> ai2d_builder_;
    runtime_tensor ai2d_out_tensor_;
    FrameCHWSize image_size_;
    FrameCHWSize input_size_;

    // Add task-specific members here when needed.
};

The concrete implementation in myapp.cc can follow the corresponding files under src/rtsmart/examples/ai/triple_camera_ai/src.

main.cc Changes#

Flow Overview#

main.cc contains the overall processing logic, including getting one frame from the selected camera, creating the input tensor, calling preprocess, inference, postprocess, and drawing the results.

The triple-camera task uses multi-threading, with one worker thread per application task. KPU inference is exclusive, so only one model can run on KPU at a time. Add synchronization locks to avoid concurrent KPU access.

The worker-thread logic is similar to the single-model and double-model examples. The main difference is that PipeLine is created in the main thread and passed into worker threads so they can fetch AI frames and insert OSD frames.

The main file is responsible for:

  • get one frame from the specified camera

  • create the input tensor

  • call preprocess

  • call inference

  • call postprocess

  • draw the result

The main-thread skeleton is:

int main(int argc, char *argv[])
{
    cout << "case " << argv[0] << " built at " << __DATE__ << " " << __TIME__ << endl;

    if (argc != 11)
    {
        print_usage(argv[0]);
        return -1;
    }

    int debug_mode = atoi(argv[5]);

    PipeLine pl(debug_mode);
    pl.Create();

    std::thread t0(face_det_video_proc, std::ref(pl), argv, 0, 4);
    std::thread t1(hand_det_video_proc, std::ref(pl), argv, 1, 5);
    std::thread t2(yolov8_det_video_proc, std::ref(pl), argv, 2, 6);

    while (getchar() != 'q')
    {
        usleep(10000);
    }

    face_det_isp_stop.store(true);
    t0.join();
    person_det_isp_stop.store(true);
    t1.join();
    hand_det_isp_stop.store(true);
    t2.join();

    pl.Destroy();
    cout << "exit success" << endl;
    return 0;
}

The loop-exit variables are typically defined as:

std::atomic<bool> face_det_isp_stop(false);
std::atomic<bool> person_det_isp_stop(false);
std::atomic<bool> hand_det_isp_stop(false);

When the user inputs q and presses Enter, all three flags are set to true, the worker threads leave their loops, and the program exits.

CMakeLists.txt and build_app.sh#

At the source root:

add_subdirectory(src)

For the example subdirectory:

set(src main.cc face_detection.cc anchors_320.cc hand_detection.cc yolov8_detect.cc ai_base.cc ai_utils.cc video_pipeline.cc)
set(bin triple_cam_ai.elf)

The build script also needs to collect the generated elf and utility files into k230_bin, for example:

collect_outputs() {
    local elf_file="${BUILD_DIR}/bin/triple_cam_ai.elf"

    if [ -f "${elf_file}" ]; then
        echo "[INFO] Collecting ELF and utility files to ${K230_BIN_DIR}..."
        cp -u "${elf_file}" "${K230_BIN_DIR}/"
        cp -u utils/* "${K230_BIN_DIR}/" 2>/dev/null || true
    else
        echo "[WARN] ELF file not found: ${elf_file}"
    fi
}

Build#

Select Board and Build Firmware#

From the RTOS root:

make list-def
make ***_defconfig
make -j

After the build finishes, the image is generated in output.

Build Method 1#

After finishing the code changes, enter src/rtsmart/examples/ai/triple_camera_ai and run:

./build_app.sh

The intermediate files are placed in build, and the deployment package is placed in k230_bin.

Build Method 2#

From the RTOS SDK root, run make menuconfig and enable:

RT-Smart UserSpace Examples Configuration
-> Enable build ai examples
-> Enable Build Triple Camera AI Programs

Then run:

make -j

This builds the deployment files directly into:

/sdcard/app/examples/ai/triple_camera_ai

You can also enter the example directory and run:

make -j

This path also supports incremental build and places the collected outputs in k230_bin.

Board Deployment#

Flash the firmware first. See:

how_to_flash

Then copy the generated elf, kmodel, and any additional files such as test images from k230_bin to:

CanMV/sdcard

Connect to the board through the serial console and run:

run.sh

Make sure the parameter order and file paths match the application code.

The deployment effect is shown below:

triple_cam_ai_deploy_res

Comments list
Comments
Log in