# Triple-Camera AI Application Development Guide

## Overview

A triple-camera AI application means running AI inference on three camera streams simultaneously on the K230 board. The reference sample uses three `GC2093` cameras connected to the three MIPI camera interfaces and runs:

- face detection on one camera
- palm detection on one camera
- YOLO 80-class detection on one camera

## Development Guide

### Modules and Task Flow

**Involved Modules:**

1. **vicap (video input capture)**: Configures camera sensor properties and channel attributes including resolution, frame rate, and data format. Implements binding camera data to display and provides AI inference frame data from each camera.

1. **vo (video output)**: Configures display device and layer attributes including position, resolution, frame rate, and data format. Displays camera frames or other input in real-time through video and OSD layers. The video layer supports YUV format only, while the OSD layer supports RGB format only.

1. **kpu**: Loads `kmodel`, configures input/output tensors, and performs model inference.

1. **ai2d**: Performs preprocessing on model input images. See [usage_ai2d](./usage_ai2d.md) for details.

**Processing Flow:**

The three-camera application is managed by `PipeLine`. Each camera is identified by a sensor ID. Each camera stream uses dual-channel processing:

- **Display Channel**: One image stream is bound to the `vo` video layer for **real-time display** of the original image
- **AI Channel**: Another image stream is sent to **AI models for inference** (face detection, palm detection, YOLO detection)

Inference results are drawn onto **OSD layers** and **merged with the live display**. This sample uses **three video layers** (one per camera) and **three OSD layers** to achieve split-screen display with detection results overlaid on each camera feed.

The overall process is shown below:

![triple_camera_ai_pipeline](https://www.kendryte.com/api/post/attachment?id=854)

### Code Structure

The sample code is located under `src/rtsmart/examples/ai/triple_camera_ai`. The three-camera configuration is already wrapped in a simplified form so users can follow the same pattern.

```text
triple_camera_ai
├── cmake
├── src
│    ├── ai_base.cc         # Model inference wrapper implementation
│    ├── ai_base.h          # Model inference header file
│    ├── ai_utils.cc        # Utility methods for model inference
│    ├── ai_utils.h         # Utility methods header file
│    ├── anchors_320.cc     # Anchors for 320-input face detection model
│    ├── face_detection.cc  # Face detection task: preprocess, inference, postprocess, drawing
│    ├── face_detection.h   # Face detection task header file
│    ├── hand_detection.cc  # Palm detection task: preprocess, inference, postprocess, drawing
│    ├── hand_detection.h   # Palm detection task header file
│    ├── yolov8_detect.cc   # YOLO 80-class detection task: preprocess, inference, postprocess, drawing
│    ├── yolov8_detect.h    # YOLO detection task header file
│    ├── main.cc            # Main function: coordinates all three tasks and multi-threaded execution
│    ├── scoped_timing.h    # Timing utility for debugging
│    ├── setting.h          # Configuration macros for display and AI frame resolution
│    ├── video_pipeline.cc  # Three-camera pipeline implementation
│    ├── video_pipeline.h   # Three-camera pipeline header file
│    └── CMakeLists.txt     # Build configuration for this task
├── utils                   # Pre-built kmodel and scripts
├── CMakeLists.txt          # Root CMakeLists
├── build_app.sh            # Compilation script
└── Makefile                # Alternative build method
```

### Code Responsibilities

The main file responsibilities are:

| File | Description |
|------|-------------|
| `ai_base.h` | Declares the common model-inference interfaces |
| `ai_base.cc` | Implements the common model-inference interfaces |
| `ai_utils.h` | Declares shared helper functions |
| `ai_utils.cc` | Implements shared helper functions |
| `scoped_timing.h` | Provides timing helpers for performance debugging |
| `setting.h` | Defines display and AI-frame configuration macros |
| `video_pipeline.h` | Declares the three-camera pipeline interface |
| `video_pipeline.cc` | Implements the three-camera pipeline operations |
| `face_detection.h/.cc` | Implements face detection: preprocess, inference, postprocess, and drawing |
| `hand_detection.h/.cc` | Implements palm detection task with same workflow |
| `yolov8_detect.h/.cc` | Implements YOLO 80-class detection task with same workflow |
| `anchors_320.cc` | Provides anchor data for the face-detection model |
| `main.cc` | Coordinates the entire application with multi-threaded execution |

**How to Use and Modify:**

- **`ai_base.*` and `scoped_timing.h`** - Implement the model inference wrapper and timing tools. These files typically do not require changes and are reused across projects.

- **`ai_utils.*`** - Provides common utility functions for data access and preprocessing. Extend these files only if the existing helpers are insufficient for your task requirements.

- **`setting.h` and `video_pipeline.*`** - Handle camera initialization, display device configuration, frame dump, and OSD insertion. They are already configured for three-camera dual-channel processing. Only modify if you need to:
  - Add a new display type or resolution
  - Modify the number of camera channels or their layout
  - Change frame dimensions or formats

- **`face_detection.*`, `hand_detection.*`, `yolov8_detect.*`, and `main.cc`** - These are the files you focus on when developing your application. Users typically:
  - Modify these task-specific files to implement preprocessing, inference, postprocessing, and drawing for your specific models
  - Replace the reference tasks (face detection, palm detection, YOLO) with your own application logic
  - Update `main.cc` to orchestrate your own task classes
  - Handle multi-threaded synchronization for KPU access (KPU is exclusive and requires mutex locks for concurrent inference)

## Code Details

### `setting.h` Configuration

The macros in `setting.h` configure camera output, display output, OSD size, and AI-frame resolution.

| Macro | Description |
| --- | --- |
| `ISP_WIDTH` | ISP output width |
| `ISP_HEIGHT` | ISP output height |
| `DISPLAY_MODE` | `0`: `1920 x 1080` LT9611, `1`: `800 x 480` ST7701 |
| `DISPLAY_WIDTH` | Display width |
| `DISPLAY_HEIGHT` | Display height |
| `DISPLAY_ROTATE` | `0`: no rotation, `1`: rotate 90 degrees |
| `AI_FRAME_WIDTH` | AI frame width |
| `AI_FRAME_HEIGHT` | AI frame height |
| `AI_FRAME_CHANNEL` | AI frame channels |
| `USE_OSD` | Whether to enable OSD |
| `OSD_WIDTH` | OSD width |
| `OSD_HEIGHT` | OSD height |
| `OSD_CHANNEL` | OSD channels |

Typical fragments are:

```cpp
#define ISP_WIDTH 1920
#define ISP_HEIGHT 1080
```

This is the source camera resolution. It is then split into a display branch and an AI branch.

```cpp
#define DISPLAY_MODE 1
#define DISPLAY_WIDTH 400
#define DISPLAY_HEIGHT 240
#define DISPLAY_ROTATE 1
```

For the triple-camera sample, `800 x 480` is split into `400 x 240` windows because multiple camera images are shown on one screen at the same time.

`ST7701` is physically `480 x 800` and requires `90` degree rotation. That rotation is already wrapped in the lower `vo` implementation, so the application can treat it as a landscape display.

```cpp
#define AI_FRAME_WIDTH 640
#define AI_FRAME_HEIGHT 360
#define AI_FRAME_CHANNEL 3
```

This is the AI branch before preprocessing. The sample uses `PIXEL_FORMAT_RGB_888_PLANAR` in `CHW` layout.

> Note:
> The AI-frame resolution and the model-input resolution are not the same thing. The AI frame is the camera output before preprocessing. The model input is the tensor size after preprocessing. For example, the camera may output `640 x 360`, while the model may require `320 x 320`.

```cpp
#define USE_OSD 1
#define OSD_WIDTH 400
#define OSD_HEIGHT 240
#define OSD_CHANNEL 4
```

The OSD size must match the split display region. The OSD frame contains only drawn AI results, not the original image. The final visible output is produced by overlaying the OSD layer on the corresponding display layer.

When configuring the OSD layer, use its `x` and `y` position to place each camera result in the intended split-screen location.

The display layout is illustrated here:

![triple_camera_display](https://www.kendryte.com/api/post/attachment?id=855)

### `AIBase` Notes

`AIBase` in `ai_base.h` is the common wrapper class for model inference. It covers model initialization, input/output shape query, tensor initialization, KPU execution, and output retrieval.

```cpp
/**
 * @brief AI base class, wraps nncase-related operations.
 * Later application development mainly needs to focus on preprocess and postprocess.
 */
class AIBase
{
public:
    /**
     * @brief Constructor. Loads the kmodel and initializes model inputs and outputs.
     * @param kmodel_file Path to the kmodel file
     * @param model_name  Model name
     * @param debug_mode  0: no debug, 1: timing only, 2: full debug logs
     */
    AIBase(const char *kmodel_file, const string model_name, const int debug_mode = 1);

    /**
     * @brief Destructor.
     */
    ~AIBase();

    runtime_tensor get_input_tensor(size_t idx);
    void set_input_tensor(size_t idx, runtime_tensor &input_tensor);

    /**
     * @brief Run kmodel inference.
     */
    void run();

    /**
     * @brief Get the kmodel outputs and store them in class members.
     */
    void get_output();

    runtime_tensor get_output_tensor(int idx);

protected:
    string model_name_;
    int debug_mode_;
    vector<float *> p_outputs_;
    vector<vector<int>> input_shapes_;
    vector<vector<int>> output_shapes_;

private:
    void set_input_init();
    void set_output_init();

    interpreter kmodel_interp_;
    vector<unsigned char> kmodel_vec_;
};
```

In application development, the most frequently reused members are the input/output tensor shapes and the raw output pointers:

- `input_shapes_`
- `output_shapes_`
- `p_outputs_`

For example, if you need the pointer to the first model output:

```cpp
float *output0 = p_outputs_[0];
```

### Task Files

`face_detection.h/.cc`, `hand_detection.h/.cc`, and `yolov8_detect.h/.cc` are the core files users usually rewrite for their own applications.

In a real project, you can rename these task files to match your own scenario, for example `person_det.h`, `helmet_detect.cc`, or `gesture_recog.h`. Each task should define a task class that inherits from `AIBase`:

```cpp
class YourTask : public AIBase
```

That means you reuse the common inference wrapper from `AIBase` and implement the task-specific logic yourself.

The task class is mainly responsible for four parts:

| Module | Need to implement | Description |
| --- | --- | --- |
| Preprocess | Yes | Convert the input image to the model format |
| Inference | Reuse `AIBase` | The common run path is already wrapped |
| Postprocess | Yes | Convert raw model outputs to usable results |
| Draw | Yes | Draw the results on OSD or on the image |

Assume the new task files are `myapp.h` and `myapp.cc`. A simplified header can follow the same structure as the existing sample tasks:

```cpp
typedef struct ExampleResults
{
    // Define the task-specific result structure here.
} ExampleResults;

class MyApp : public AIBase
{
public:
    /**
     * @brief Constructor for video inference.
     * Loads the kmodel, initializes model inputs and outputs, and configures
     * application-specific parameters such as thresholds and preprocess behavior.
     */
    MyApp(char *kmodel_file, other_params, FrameCHWSize image_size, int debug_mode);

    ~MyApp();

    void pre_process(runtime_tensor &input_tensor);
    void inference();
    void post_process(FrameCHWSize image_size, vector<ExampleResults> &results);
    void draw_result(cv::Mat &draw_frame, vector<ExampleResults> &results);

    std::unique_ptr<ai2d_builder> ai2d_builder_;
    runtime_tensor ai2d_out_tensor_;
    FrameCHWSize image_size_;
    FrameCHWSize input_size_;

    // Add task-specific members here when needed.
};
```

The concrete implementation in `myapp.cc` can follow the corresponding files under `src/rtsmart/examples/ai/triple_camera_ai/src`.

### `main.cc` Changes

#### Flow Overview

`main.cc` contains the overall processing logic, including getting one frame from the selected camera, creating the input tensor, calling preprocess, inference, postprocess, and drawing the results.

The triple-camera task uses multi-threading, with one worker thread per application task. KPU inference is exclusive, so only one model can run on KPU at a time. Add synchronization locks to avoid concurrent KPU access.

The worker-thread logic is similar to the single-model and double-model examples. The main difference is that `PipeLine` is created in the main thread and passed into worker threads so they can fetch AI frames and insert OSD frames.

The main file is responsible for:

- get one frame from the specified camera
- create the input tensor
- call preprocess
- call inference
- call postprocess
- draw the result

The main-thread skeleton is:

```cpp
int main(int argc, char *argv[])
{
    cout << "case " << argv[0] << " built at " << __DATE__ << " " << __TIME__ << endl;

    if (argc != 11)
    {
        print_usage(argv[0]);
        return -1;
    }

    int debug_mode = atoi(argv[5]);

    PipeLine pl(debug_mode);
    pl.Create();

    std::thread t0(face_det_video_proc, std::ref(pl), argv, 0, 4);
    std::thread t1(hand_det_video_proc, std::ref(pl), argv, 1, 5);
    std::thread t2(yolov8_det_video_proc, std::ref(pl), argv, 2, 6);

    while (getchar() != 'q')
    {
        usleep(10000);
    }

    face_det_isp_stop.store(true);
    t0.join();
    person_det_isp_stop.store(true);
    t1.join();
    hand_det_isp_stop.store(true);
    t2.join();

    pl.Destroy();
    cout << "exit success" << endl;
    return 0;
}
```

The loop-exit variables are typically defined as:

```cpp
std::atomic<bool> face_det_isp_stop(false);
std::atomic<bool> person_det_isp_stop(false);
std::atomic<bool> hand_det_isp_stop(false);
```

When the user inputs `q` and presses Enter, all three flags are set to `true`, the worker threads leave their loops, and the program exits.

### `CMakeLists.txt` and `build_app.sh`

At the source root:

```cmake
add_subdirectory(src)
```

For the example subdirectory:

```cmake
set(src main.cc face_detection.cc anchors_320.cc hand_detection.cc yolov8_detect.cc ai_base.cc ai_utils.cc video_pipeline.cc)
set(bin triple_cam_ai.elf)
```

The build script also needs to collect the generated `elf` and utility files into `k230_bin`, for example:

```bash
collect_outputs() {
    local elf_file="${BUILD_DIR}/bin/triple_cam_ai.elf"

    if [ -f "${elf_file}" ]; then
        echo "[INFO] Collecting ELF and utility files to ${K230_BIN_DIR}..."
        cp -u "${elf_file}" "${K230_BIN_DIR}/"
        cp -u utils/* "${K230_BIN_DIR}/" 2>/dev/null || true
    else
        echo "[WARN] ELF file not found: ${elf_file}"
    fi
}
```

## Build

### Select Board and Build Firmware

From the RTOS root:

```bash
make list-def
make ***_defconfig
make -j
```

After the build finishes, the image is generated in `output`.

### Build Method 1

After finishing the code changes, enter `src/rtsmart/examples/ai/triple_camera_ai` and run:

```bash
./build_app.sh
```

The intermediate files are placed in `build`, and the deployment package is placed in `k230_bin`.

### Build Method 2

From the RTOS SDK root, run `make menuconfig` and enable:

```text
RT-Smart UserSpace Examples Configuration
-> Enable build ai examples
-> Enable Build Triple Camera AI Programs
```

Then run:

```bash
make -j
```

This builds the deployment files directly into:

```text
/sdcard/app/examples/ai/triple_camera_ai
```

You can also enter the example directory and run:

```bash
make -j
```

This path also supports incremental build and places the collected outputs in `k230_bin`.

## Board Deployment

Flash the firmware first. See:

[how_to_flash](../../userguide/how_to_flash.md)

Then copy the generated `elf`, `kmodel`, and any additional files such as test images from `k230_bin` to:

```text
CanMV/sdcard
```

Connect to the board through the serial console and run:

```bash
run.sh
```

Make sure the parameter order and file paths match the application code.

The deployment effect is shown below:

![triple_cam_ai_deploy_res](https://www.kendryte.com/api/post/attachment?id=857)