# UVC + AI Application Development Guide

## Overview

UVC + AI is a common embedded application pattern that combines a UVC video stream with AI inference so the system can detect and recognize targets in live video. This document uses **face detection** as the reference example to explain how to develop a UVC-based AI application.

The model-inference part of this sample is the same as the single-model guide in [single_model_example.md](./single_model_example.md). The main difference is that the `PipeLine` data source is replaced with a UVC camera.

Unlike the MIPI-camera path, the current UVC camera path does not support multiple channels. This sample is therefore built around a **single-channel UVC camera**, which is the main difference compared with MIPI-camera inference.

## Development Guide

### Modules and Task Flow

**Involved Modules:**

1. **uvc**: Acquires video frames from the UVC camera. The current sample supports `1920 x 1080` and `640 x 480` output resolutions.

1. **vo**: Configures the display device and shows the final image with detection results.

1. **vdec**: Decodes the JPEG video stream from the UVC camera into `YUV420` format.

1. **noai_2d**: Performs format conversion - converts decoded `YUV420` frames into `RGB` for AI preprocessing, and converts processed frames back to display-compatible formats.

1. **kpu**: Loads `kmodel`, configures input/output tensors, and performs model inference.

1. **ai2d**: Performs preprocessing on model input images. See [usage_ai2d](./usage_ai2d.md) for details.

**Processing Flow:**

Unlike the MIPI-camera path with dual-channel processing, UVC-based applications use a **single-channel processing** approach since UVC does not support multiple simultaneous output channels. The workflow is:

1. **Capture**: Get one video frame from the UVC camera
1. **Decode**: Decode the JPEG stream to `YUV420` format using `vdec`
1. **Convert**: Use `noai_2d` to convert `YUV420` to `RGB888` for AI processing
1. **Preprocess**: Prepare the frame for model input using `ai2d`
1. **Infer**: Run model inference using `kpu`
1. **Postprocess**: Extract and interpret detection results
1. **Draw**: Overlay detection boxes and labels on the source image
1. **Display**: Convert image to display format and send to screen

The overall UVC processing logic is shown below:

![uvc_process](https://www.kendryte.com/api/post/attachment?id=652)

### Code Structure

Using the UVC + face-detection task as the example, the existing code structure is:

```text
uvc_face_detection
├── cmake
├── src
│    ├── ai_base.cc         # Model inference wrapper implementation
│    ├── ai_base.h          # Model inference header file
│    ├── ai_utils.cc        # Utility methods for model inference
│    ├── ai_utils.h         # Utility methods header file
│    ├── anchors_320.cc     # Anchors for 320-input face detection model
│    ├── anchors_640.cc     # Anchors for 640-input face detection model
│    ├── face_detection.cc  # Face detection task: preprocess, inference, postprocess, drawing
│    ├── face_detection.h   # Face detection task header file
│    ├── main.cc            # Main function: orchestrates UVC capture and AI inference
│    ├── scoped_timing.h    # Timing utility for debugging
│    ├── setting.h          # Configuration macros for UVC, display, and AI frame resolution
│    ├── uvc_pipeline.cc    # UVC pipeline implementation: capture, decode, format conversion, frame dump
│    ├── uvc_pipeline.h     # UVC pipeline header file
│    └── CMakeLists.txt     # Build configuration for this task
├── utils                   # Pre-built kmodel and scripts
├── CMakeLists.txt          # Root CMakeLists
├── build_app.sh            # Compilation script
└── Makefile                # Alternative build method
```

### Code Responsibilities

The main file responsibilities are:

| File | Description |
|------|-------------|
| `ai_base.h` | Declares the common model-inference interfaces |
| `ai_base.cc` | Implements the common model-inference interfaces |
| `ai_utils.h` | Declares shared helper functions |
| `ai_utils.cc` | Implements shared helper functions |
| `scoped_timing.h` | Provides timing helpers for performance debugging |
| `setting.h` | Defines UVC, display, and AI-frame configuration macros |
| `uvc_pipeline.h` | Declares the UVC pipeline interface |
| `uvc_pipeline.cc` | Implements UVC capture, JPEG decode, format conversion, and display insertion |
| `face_detection.h` | Declares preprocess, inference, postprocess, and drawing for face detection |
| `face_detection.cc` | Implements face detection task logic |
| `anchors_320.cc` | Anchor data for the 320-input detection model |
| `anchors_640.cc` | Anchor data for the 640-input detection model |
| `main.cc` | Organizes the complete application logic and frame processing |

**How to Use and Modify:**

- **`ai_base.*` and `scoped_timing.h`** - Implement the model inference wrapper and timing tools. These files typically do not require changes and are reused across projects.

- **`ai_utils.*`** - Provides common utility functions. Extend these only if existing helpers are insufficient.

- **`setting.h`, `uvc_pipeline.h`, and `uvc_pipeline.cc`** - Handle UVC capture, JPEG decoding, format conversion via `noai_2d`, frame dump, and display insertion. These typically only need changes if you:
  - Add a new display type or resolution
  - Change UVC input resolution or format
  - Modify frame dimensions or color space conversions

  The current implementation supports both `LT9611 HDMI 1920×1080` and `ST7701 LCD 800×480` displays.

- **`face_detection.*` and `main.cc`** - These are the main files users modify when developing a new UVC + AI application. Users typically:
  - Implement task-specific preprocessing, inference, postprocessing, and drawing logic
  - Modify `main.cc` to orchestrate the complete UVC-to-display pipeline
  - Replace face detection with their own detection or recognition tasks

## Code Details

### `setting.h` Configuration

The macros in `setting.h` configure UVC output, display output, and AI-frame resolution.

| Macro | Description |
| --- | --- |
| `UVC_WIDTH` | UVC output width |
| `UVC_HEIGHT` | UVC output height |
| `DISPLAY_MODE` | `0`: `1920 x 1080` LT9611, `1`: `800 x 480` ST7701 |
| `DISPLAY_WIDTH` | Display width |
| `DISPLAY_HEIGHT` | Display height |
| `DISPLAY_ROTATE` | `0`: no rotation, `1`: rotate 90 degrees |
| `AI_FRAME_WIDTH` | AI frame width |
| `AI_FRAME_HEIGHT` | AI frame height |
| `AI_FRAME_CHANNEL` | AI frame channels |

Typical configuration fragments are:

```cpp
#define UVC_WIDTH 640
#define UVC_HEIGHT 480
```

The current sample supports `1920 x 1080` and `640 x 480` UVC output. Both can be displayed on HDMI and LCD targets.

```cpp
#define DISPLAY_MODE 1
#define DISPLAY_WIDTH 640
#define DISPLAY_HEIGHT 480
#define DISPLAY_ROTATE 1
```

These parameters configure the display resolution.

```cpp
#define AI_FRAME_WIDTH 640
#define AI_FRAME_HEIGHT 480
#define AI_FRAME_CHANNEL 3
```

The AI-frame size must match the UVC output resolution. The source frame arrives as `JPEG`, is decoded into `YUV420`, then converted by `noai_2d` into `RGB888` in `HWC` layout so it can be used as model input.

### `AIBase` Notes

`AIBase` in `ai_base.h` is the common wrapper class used for model inference. It includes model initialization, input/output shape handling, tensor initialization, inference execution, and output retrieval.

```cpp
/**
 * @brief AI base class, wraps nncase-related operations.
 * Later application development mainly needs to focus on preprocess and postprocess.
 */
class AIBase
{
public:
    /**
     * @brief Constructor. Loads the kmodel and initializes model inputs and outputs.
     * @param kmodel_file Path to the kmodel file
     * @param model_name  Model name
     * @param debug_mode  0: no debug, 1: timing only, 2: full debug logs
     */
    AIBase(const char *kmodel_file, const string model_name, const int debug_mode = 1);

    ~AIBase();
    runtime_tensor get_input_tensor(size_t idx);
    void set_input_tensor(size_t idx, runtime_tensor &input_tensor);
    void run();
    void get_output();
    runtime_tensor get_output_tensor(int idx);

protected:
    string model_name_;
    int debug_mode_;
    vector<float *> p_outputs_;
    vector<vector<int>> input_shapes_;
    vector<vector<int>> output_shapes_;

private:
    void set_input_init();
    void set_output_init();

    interpreter kmodel_interp_;
    vector<unsigned char> kmodel_vec_;
};
```

The most commonly reused members in application development are:

- `input_shapes_`
- `output_shapes_`
- `p_outputs_`

For example, the pointer to the first output tensor can be accessed as:

```cpp
float *output0 = p_outputs_[0];
```

### Task Header and Source Files

`face_detection.h` and `face_detection.cc` are the core files users usually implement themselves during secondary development.

For a new application, the task-specific header and source files are the main files you implement yourself. In a real project, you can rename them to match your task, for example `person_det.h`, `helmet_detect.cc`, or `gesture_recog.h`. The task class should inherit from `AIBase`:

```cpp
class YourTask : public AIBase
```

That means you reuse the common inference wrapper and complete the task-specific logic yourself.

The task class is responsible for:

| Module | Need to implement | Description |
| --- | --- | --- |
| Preprocess | Yes | Convert the input image to the model format |
| Inference | Reuse `AIBase` | The common run path is already wrapped |
| Postprocess | Yes | Convert raw model outputs into usable results |
| Draw | Yes | Draw the results on the image |

A simplified task skeleton is:

```cpp
typedef struct ExampleResults
{
    // Define the task result structure here.
} ExampleResults;

class MyApp : public AIBase
{
public:
    /**
     * @brief Constructor for video inference.
     * Loads the kmodel, initializes model inputs and outputs, and configures
     * application-specific parameters such as thresholds and preprocess behavior.
     */
    MyApp(char *kmodel_file, other_params, FrameCHWSize image_size, int debug_mode);
    ~MyApp();

    void pre_process(runtime_tensor &input_tensor);
    void inference();
    void post_process(FrameCHWSize image_size, vector<ExampleResults> &results);
    void draw_result(cv::Mat &draw_frame, vector<ExampleResults> &results);

    std::unique_ptr<ai2d_builder> ai2d_builder_;
    runtime_tensor ai2d_out_tensor_;
    FrameCHWSize image_size_;
    FrameCHWSize input_size_;

    // Add task-specific members here when needed.
};
```

You can follow the implementation under `src/rtsmart/examples/ai/uvc_face_detection/src/face_detection.cc`.

### `main.cc` Changes

#### Flow Overview

`main.cc` implements the complete task flow:

- get one frame from the UVC camera
- create the input tensor
- call preprocess
- call inference
- call postprocess
- draw the result
- release the frame

The video loop can be summarized as:

```cpp
FrameCHWSize image_size = {AI_FRAME_CHANNEL, AI_FRAME_HEIGHT, AI_FRAME_WIDTH};
dims_t in_shape {1, AI_FRAME_CHANNEL, AI_FRAME_HEIGHT, AI_FRAME_WIDTH};
runtime_tensor input_tensor = host_runtime_tensor::create(typecode_t::dt_uint8, in_shape, hrt::pool_shared).expect("cannot create input tensor");
auto input_buf = input_tensor.impl()->to_host().unwrap()->buffer().as_host().unwrap().map(map_access_::map_write).unwrap().buffer();

UVC_PipeLine pl(debug_mode);
pl.Create();

DumpRes dump_res;
MyApp my_app(argv[1], atof(argv[2]), atof(argv[3]), image_size, atoi(argv[5]));
vector<ExampleResults> results;

std::vector<uint8_t> chw_vec;
std::vector<cv::Mat> rgbChannels(3);
cv::Mat ori_img;
int ret = 0;

while (!isp_stop)
{
    ScopedTiming st("total time", 1);
    ret = pl.GetFrame(dump_res);
    if (ret)
    {
        printf("GetFrame fail\n");
        continue;
    }

    {
        ScopedTiming st("create tensor", debug_mode);
        void *vaddr = reinterpret_cast<void *>(dump_res.virt_addr);
        ori_img = cv::Mat(image_size.height, image_size.width, CV_8UC3, vaddr);

        chw_vec.clear();
        rgbChannels.clear();
        cv::split(ori_img, rgbChannels);
        for (auto i = 0; i < 3; i++)
        {
            std::vector<uint8_t> data = std::vector<uint8_t>(rgbChannels[i].reshape(1, 1));
            chw_vec.insert(chw_vec.end(), data.begin(), data.end());
        }
        memcpy(reinterpret_cast<char *>(input_buf.data()), chw_vec.data(), chw_vec.size());
        hrt::sync(input_tensor, sync_op_t::sync_write_back, true).expect("write back input failed");
    }

    results.clear();
    my_app.pre_process(input_tensor);
    my_app.inference();
    my_app.post_process(image_size, results);
    my_app.draw_result(ori_img, results);
    pl.ReleaseFrame(dump_res);
}

pl.Destroy();
```

Compared with the MIPI + OSD flow, the UVC sample draws the result directly on the original image instead of drawing on a transparent OSD frame. This is one of the main architectural differences between the UVC path and the MIPI-camera path.

The video loop usually runs in a dedicated thread. When the user inputs `q`, set `isp_stop` to `true` so the thread can exit.

### `CMakeLists.txt` and `build_app.sh`

At the source root:

```cmake
add_subdirectory(src)
```

For the task subdirectory:

```cmake
set(src main.cc face_detection.cc anchors_320.cc anchors_640.cc ai_base.cc ai_utils.cc uvc_pipeline.cc)
set(bin uvc_face_detection.elf)
```

The build script should collect the generated `elf` and utility files into `k230_bin`, for example:

```bash
collect_outputs() {
    local elf_file="${BUILD_DIR}/bin/uvc_face_detection.elf"

    if [ -f "${elf_file}" ]; then
        echo "[INFO] Collecting ELF and utility files to ${K230_BIN_DIR}..."
        cp -u "${elf_file}" "${K230_BIN_DIR}/"
        cp -u utils/* "${K230_BIN_DIR}/" 2>/dev/null || true
    else
        echo "[WARN] ELF file not found: ${elf_file}"
    fi
}
```

## Build

### Select Board and Build Firmware

From the RTOS SDK root:

```bash
make list-def
make ***_defconfig
make -j
```

After the firmware build finishes, the image is generated in the `output` directory.

### Build Method 1

After you finish the code changes, go to the directory containing `build_app.sh` and run:

```bash
./build_app.sh
```

The build intermediates are generated in `build`, and the deployment package is collected in `k230_bin`.

### Build Method 2

From the RTOS SDK root, run `make menuconfig` and enable:

```text
RT-Smart UserSpace Examples Configuration
-> Enable build ai examples
-> Enable Build UVC+AI Programs
```

Then run:

```bash
make -j
```

With this method, the files are built directly into:

```text
/sdcard/app/examples/ai/uvc_face_detection
```

You can also enter the example directory and run:

```bash
make -j
```

This path also supports incremental build and places the collected outputs in `k230_bin`.

## Board Deployment

Flash the firmware first. See:

[how_to_flash](../../userguide/how_to_flash.md)

After boot, a virtual disk named `CanMV` is visible. Copy the generated files from `k230_bin` to:

```text
CanMV/sdcard
```

This typically includes:

- the generated `elf`
- the required `kmodel`
- any extra files used by the sample

Then connect to the board through a serial console and run:

```bash
uvc_face_detect_isp.sh
```

Make sure the argument order and file paths match your code and deployment layout.

The deployment effect is shown below:

![uvc_deploy_res](https://www.kendryte.com/api/post/attachment?id=843)
