Note

This is the documentation for the latest development branch and may refer to features that are not available in released versions. If you are looking for the documentation for a specific release, use the drop-down menu on the left and select the desired version.

UVC + AI Application Development Guide#

Overview#

UVC + AI is a common embedded application pattern that combines a UVC video stream with AI inference so the system can detect and recognize targets in live video. This document uses face detection as the reference example to explain how to develop a UVC-based AI application.

The model-inference part of this sample is the same as the single-model guide in single_model_example.md. The main difference is that the PipeLine data source is replaced with a UVC camera.

Unlike the MIPI-camera path, the current UVC camera path does not support multiple channels. This sample is therefore built around a single-channel UVC camera, which is the main difference compared with MIPI-camera inference.

Development Guide#

Modules and Task Flow#

Involved Modules:

  1. uvc: Acquires video frames from the UVC camera. The current sample supports 1920 x 1080 and 640 x 480 output resolutions.

  2. vo: Configures the display device and shows the final image with detection results.

  3. vdec: Decodes the JPEG video stream from the UVC camera into YUV420 format.

  4. noai_2d: Performs format conversion - converts decoded YUV420 frames into RGB for AI preprocessing, and converts processed frames back to display-compatible formats.

  5. kpu: Loads kmodel, configures input/output tensors, and performs model inference.

  6. ai2d: Performs preprocessing on model input images. See usage_ai2d for details.

Processing Flow:

Unlike the MIPI-camera path with dual-channel processing, UVC-based applications use a single-channel processing approach since UVC does not support multiple simultaneous output channels. The workflow is:

  1. Capture: Get one video frame from the UVC camera

  2. Decode: Decode the JPEG stream to YUV420 format using vdec

  3. Convert: Use noai_2d to convert YUV420 to RGB888 for AI processing

  4. Preprocess: Prepare the frame for model input using ai2d

  5. Infer: Run model inference using kpu

  6. Postprocess: Extract and interpret detection results

  7. Draw: Overlay detection boxes and labels on the source image

  8. Display: Convert image to display format and send to screen

The overall UVC processing logic is shown below:

uvc_process

Code Structure#

Using the UVC + face-detection task as the example, the existing code structure is:

uvc_face_detection
├── cmake
├── src
│    ├── ai_base.cc         # Model inference wrapper implementation
│    ├── ai_base.h          # Model inference header file
│    ├── ai_utils.cc        # Utility methods for model inference
│    ├── ai_utils.h         # Utility methods header file
│    ├── anchors_320.cc     # Anchors for 320-input face detection model
│    ├── anchors_640.cc     # Anchors for 640-input face detection model
│    ├── face_detection.cc  # Face detection task: preprocess, inference, postprocess, drawing
│    ├── face_detection.h   # Face detection task header file
│    ├── main.cc            # Main function: orchestrates UVC capture and AI inference
│    ├── scoped_timing.h    # Timing utility for debugging
│    ├── setting.h          # Configuration macros for UVC, display, and AI frame resolution
│    ├── uvc_pipeline.cc    # UVC pipeline implementation: capture, decode, format conversion, frame dump
│    ├── uvc_pipeline.h     # UVC pipeline header file
│    └── CMakeLists.txt     # Build configuration for this task
├── utils                   # Pre-built kmodel and scripts
├── CMakeLists.txt          # Root CMakeLists
├── build_app.sh            # Compilation script
└── Makefile                # Alternative build method

Code Responsibilities#

The main file responsibilities are:

File

Description

ai_base.h

Declares the common model-inference interfaces

ai_base.cc

Implements the common model-inference interfaces

ai_utils.h

Declares shared helper functions

ai_utils.cc

Implements shared helper functions

scoped_timing.h

Provides timing helpers for performance debugging

setting.h

Defines UVC, display, and AI-frame configuration macros

uvc_pipeline.h

Declares the UVC pipeline interface

uvc_pipeline.cc

Implements UVC capture, JPEG decode, format conversion, and display insertion

face_detection.h

Declares preprocess, inference, postprocess, and drawing for face detection

face_detection.cc

Implements face detection task logic

anchors_320.cc

Anchor data for the 320-input detection model

anchors_640.cc

Anchor data for the 640-input detection model

main.cc

Organizes the complete application logic and frame processing

How to Use and Modify:

  • ai_base.* and scoped_timing.h - Implement the model inference wrapper and timing tools. These files typically do not require changes and are reused across projects.

  • ai_utils.* - Provides common utility functions. Extend these only if existing helpers are insufficient.

  • setting.h, uvc_pipeline.h, and uvc_pipeline.cc - Handle UVC capture, JPEG decoding, format conversion via noai_2d, frame dump, and display insertion. These typically only need changes if you:

    • Add a new display type or resolution

    • Change UVC input resolution or format

    • Modify frame dimensions or color space conversions

    The current implementation supports both LT9611 HDMI 1920×1080 and ST7701 LCD 800×480 displays.

  • face_detection.* and main.cc - These are the main files users modify when developing a new UVC + AI application. Users typically:

    • Implement task-specific preprocessing, inference, postprocessing, and drawing logic

    • Modify main.cc to orchestrate the complete UVC-to-display pipeline

    • Replace face detection with their own detection or recognition tasks

Code Details#

setting.h Configuration#

The macros in setting.h configure UVC output, display output, and AI-frame resolution.

Macro

Description

UVC_WIDTH

UVC output width

UVC_HEIGHT

UVC output height

DISPLAY_MODE

0: 1920 x 1080 LT9611, 1: 800 x 480 ST7701

DISPLAY_WIDTH

Display width

DISPLAY_HEIGHT

Display height

DISPLAY_ROTATE

0: no rotation, 1: rotate 90 degrees

AI_FRAME_WIDTH

AI frame width

AI_FRAME_HEIGHT

AI frame height

AI_FRAME_CHANNEL

AI frame channels

Typical configuration fragments are:

#define UVC_WIDTH 640
#define UVC_HEIGHT 480

The current sample supports 1920 x 1080 and 640 x 480 UVC output. Both can be displayed on HDMI and LCD targets.

#define DISPLAY_MODE 1
#define DISPLAY_WIDTH 640
#define DISPLAY_HEIGHT 480
#define DISPLAY_ROTATE 1

These parameters configure the display resolution.

#define AI_FRAME_WIDTH 640
#define AI_FRAME_HEIGHT 480
#define AI_FRAME_CHANNEL 3

The AI-frame size must match the UVC output resolution. The source frame arrives as JPEG, is decoded into YUV420, then converted by noai_2d into RGB888 in HWC layout so it can be used as model input.

AIBase Notes#

AIBase in ai_base.h is the common wrapper class used for model inference. It includes model initialization, input/output shape handling, tensor initialization, inference execution, and output retrieval.

/**
 * @brief AI base class, wraps nncase-related operations.
 * Later application development mainly needs to focus on preprocess and postprocess.
 */
class AIBase
{
public:
    /**
     * @brief Constructor. Loads the kmodel and initializes model inputs and outputs.
     * @param kmodel_file Path to the kmodel file
     * @param model_name  Model name
     * @param debug_mode  0: no debug, 1: timing only, 2: full debug logs
     */
    AIBase(const char *kmodel_file, const string model_name, const int debug_mode = 1);

    ~AIBase();
    runtime_tensor get_input_tensor(size_t idx);
    void set_input_tensor(size_t idx, runtime_tensor &input_tensor);
    void run();
    void get_output();
    runtime_tensor get_output_tensor(int idx);

protected:
    string model_name_;
    int debug_mode_;
    vector<float *> p_outputs_;
    vector<vector<int>> input_shapes_;
    vector<vector<int>> output_shapes_;

private:
    void set_input_init();
    void set_output_init();

    interpreter kmodel_interp_;
    vector<unsigned char> kmodel_vec_;
};

The most commonly reused members in application development are:

  • input_shapes_

  • output_shapes_

  • p_outputs_

For example, the pointer to the first output tensor can be accessed as:

float *output0 = p_outputs_[0];

Task Header and Source Files#

face_detection.h and face_detection.cc are the core files users usually implement themselves during secondary development.

For a new application, the task-specific header and source files are the main files you implement yourself. In a real project, you can rename them to match your task, for example person_det.h, helmet_detect.cc, or gesture_recog.h. The task class should inherit from AIBase:

class YourTask : public AIBase

That means you reuse the common inference wrapper and complete the task-specific logic yourself.

The task class is responsible for:

Module

Need to implement

Description

Preprocess

Yes

Convert the input image to the model format

Inference

Reuse AIBase

The common run path is already wrapped

Postprocess

Yes

Convert raw model outputs into usable results

Draw

Yes

Draw the results on the image

A simplified task skeleton is:

typedef struct ExampleResults
{
    // Define the task result structure here.
} ExampleResults;

class MyApp : public AIBase
{
public:
    /**
     * @brief Constructor for video inference.
     * Loads the kmodel, initializes model inputs and outputs, and configures
     * application-specific parameters such as thresholds and preprocess behavior.
     */
    MyApp(char *kmodel_file, other_params, FrameCHWSize image_size, int debug_mode);
    ~MyApp();

    void pre_process(runtime_tensor &input_tensor);
    void inference();
    void post_process(FrameCHWSize image_size, vector<ExampleResults> &results);
    void draw_result(cv::Mat &draw_frame, vector<ExampleResults> &results);

    std::unique_ptr<ai2d_builder> ai2d_builder_;
    runtime_tensor ai2d_out_tensor_;
    FrameCHWSize image_size_;
    FrameCHWSize input_size_;

    // Add task-specific members here when needed.
};

You can follow the implementation under src/rtsmart/examples/ai/uvc_face_detection/src/face_detection.cc.

main.cc Changes#

Flow Overview#

main.cc implements the complete task flow:

  • get one frame from the UVC camera

  • create the input tensor

  • call preprocess

  • call inference

  • call postprocess

  • draw the result

  • release the frame

The video loop can be summarized as:

FrameCHWSize image_size = {AI_FRAME_CHANNEL, AI_FRAME_HEIGHT, AI_FRAME_WIDTH};
dims_t in_shape {1, AI_FRAME_CHANNEL, AI_FRAME_HEIGHT, AI_FRAME_WIDTH};
runtime_tensor input_tensor = host_runtime_tensor::create(typecode_t::dt_uint8, in_shape, hrt::pool_shared).expect("cannot create input tensor");
auto input_buf = input_tensor.impl()->to_host().unwrap()->buffer().as_host().unwrap().map(map_access_::map_write).unwrap().buffer();

UVC_PipeLine pl(debug_mode);
pl.Create();

DumpRes dump_res;
MyApp my_app(argv[1], atof(argv[2]), atof(argv[3]), image_size, atoi(argv[5]));
vector<ExampleResults> results;

std::vector<uint8_t> chw_vec;
std::vector<cv::Mat> rgbChannels(3);
cv::Mat ori_img;
int ret = 0;

while (!isp_stop)
{
    ScopedTiming st("total time", 1);
    ret = pl.GetFrame(dump_res);
    if (ret)
    {
        printf("GetFrame fail\n");
        continue;
    }

    {
        ScopedTiming st("create tensor", debug_mode);
        void *vaddr = reinterpret_cast<void *>(dump_res.virt_addr);
        ori_img = cv::Mat(image_size.height, image_size.width, CV_8UC3, vaddr);

        chw_vec.clear();
        rgbChannels.clear();
        cv::split(ori_img, rgbChannels);
        for (auto i = 0; i < 3; i++)
        {
            std::vector<uint8_t> data = std::vector<uint8_t>(rgbChannels[i].reshape(1, 1));
            chw_vec.insert(chw_vec.end(), data.begin(), data.end());
        }
        memcpy(reinterpret_cast<char *>(input_buf.data()), chw_vec.data(), chw_vec.size());
        hrt::sync(input_tensor, sync_op_t::sync_write_back, true).expect("write back input failed");
    }

    results.clear();
    my_app.pre_process(input_tensor);
    my_app.inference();
    my_app.post_process(image_size, results);
    my_app.draw_result(ori_img, results);
    pl.ReleaseFrame(dump_res);
}

pl.Destroy();

Compared with the MIPI + OSD flow, the UVC sample draws the result directly on the original image instead of drawing on a transparent OSD frame. This is one of the main architectural differences between the UVC path and the MIPI-camera path.

The video loop usually runs in a dedicated thread. When the user inputs q, set isp_stop to true so the thread can exit.

CMakeLists.txt and build_app.sh#

At the source root:

add_subdirectory(src)

For the task subdirectory:

set(src main.cc face_detection.cc anchors_320.cc anchors_640.cc ai_base.cc ai_utils.cc uvc_pipeline.cc)
set(bin uvc_face_detection.elf)

The build script should collect the generated elf and utility files into k230_bin, for example:

collect_outputs() {
    local elf_file="${BUILD_DIR}/bin/uvc_face_detection.elf"

    if [ -f "${elf_file}" ]; then
        echo "[INFO] Collecting ELF and utility files to ${K230_BIN_DIR}..."
        cp -u "${elf_file}" "${K230_BIN_DIR}/"
        cp -u utils/* "${K230_BIN_DIR}/" 2>/dev/null || true
    else
        echo "[WARN] ELF file not found: ${elf_file}"
    fi
}

Build#

Select Board and Build Firmware#

From the RTOS SDK root:

make list-def
make ***_defconfig
make -j

After the firmware build finishes, the image is generated in the output directory.

Build Method 1#

After you finish the code changes, go to the directory containing build_app.sh and run:

./build_app.sh

The build intermediates are generated in build, and the deployment package is collected in k230_bin.

Build Method 2#

From the RTOS SDK root, run make menuconfig and enable:

RT-Smart UserSpace Examples Configuration
-> Enable build ai examples
-> Enable Build UVC+AI Programs

Then run:

make -j

With this method, the files are built directly into:

/sdcard/app/examples/ai/uvc_face_detection

You can also enter the example directory and run:

make -j

This path also supports incremental build and places the collected outputs in k230_bin.

Board Deployment#

Flash the firmware first. See:

how_to_flash

After boot, a virtual disk named CanMV is visible. Copy the generated files from k230_bin to:

CanMV/sdcard

This typically includes:

  • the generated elf

  • the required kmodel

  • any extra files used by the sample

Then connect to the board through a serial console and run:

uvc_face_detect_isp.sh

Make sure the argument order and file paths match your code and deployment layout.

The deployment effect is shown below:

uvc_deploy_res

Comments list
Comments
Log in