UVC + AI Application Development Guide#
Overview#
UVC + AI is a common embedded application pattern that combines a UVC video stream with AI inference so the system can detect and recognize targets in live video. This document uses face detection as the reference example to explain how to develop a UVC-based AI application.
The model-inference part of this sample is the same as the single-model guide in single_model_example.md. The main difference is that the PipeLine data source is replaced with a UVC camera.
Unlike the MIPI-camera path, the current UVC camera path does not support multiple channels. This sample is therefore built around a single-channel UVC camera, which is the main difference compared with MIPI-camera inference.
Development Guide#
Modules and Task Flow#
Involved Modules:
uvc: Acquires video frames from the UVC camera. The current sample supports
1920 x 1080and640 x 480output resolutions.vo: Configures the display device and shows the final image with detection results.
vdec: Decodes the JPEG video stream from the UVC camera into
YUV420format.noai_2d: Performs format conversion - converts decoded
YUV420frames intoRGBfor AI preprocessing, and converts processed frames back to display-compatible formats.kpu: Loads
kmodel, configures input/output tensors, and performs model inference.ai2d: Performs preprocessing on model input images. See usage_ai2d for details.
Processing Flow:
Unlike the MIPI-camera path with dual-channel processing, UVC-based applications use a single-channel processing approach since UVC does not support multiple simultaneous output channels. The workflow is:
Capture: Get one video frame from the UVC camera
Decode: Decode the JPEG stream to
YUV420format usingvdecConvert: Use
noai_2dto convertYUV420toRGB888for AI processingPreprocess: Prepare the frame for model input using
ai2dInfer: Run model inference using
kpuPostprocess: Extract and interpret detection results
Draw: Overlay detection boxes and labels on the source image
Display: Convert image to display format and send to screen
The overall UVC processing logic is shown below:
Code Structure#
Using the UVC + face-detection task as the example, the existing code structure is:
uvc_face_detection
├── cmake
├── src
│ ├── ai_base.cc # Model inference wrapper implementation
│ ├── ai_base.h # Model inference header file
│ ├── ai_utils.cc # Utility methods for model inference
│ ├── ai_utils.h # Utility methods header file
│ ├── anchors_320.cc # Anchors for 320-input face detection model
│ ├── anchors_640.cc # Anchors for 640-input face detection model
│ ├── face_detection.cc # Face detection task: preprocess, inference, postprocess, drawing
│ ├── face_detection.h # Face detection task header file
│ ├── main.cc # Main function: orchestrates UVC capture and AI inference
│ ├── scoped_timing.h # Timing utility for debugging
│ ├── setting.h # Configuration macros for UVC, display, and AI frame resolution
│ ├── uvc_pipeline.cc # UVC pipeline implementation: capture, decode, format conversion, frame dump
│ ├── uvc_pipeline.h # UVC pipeline header file
│ └── CMakeLists.txt # Build configuration for this task
├── utils # Pre-built kmodel and scripts
├── CMakeLists.txt # Root CMakeLists
├── build_app.sh # Compilation script
└── Makefile # Alternative build method
Code Responsibilities#
The main file responsibilities are:
File |
Description |
|---|---|
|
Declares the common model-inference interfaces |
|
Implements the common model-inference interfaces |
|
Declares shared helper functions |
|
Implements shared helper functions |
|
Provides timing helpers for performance debugging |
|
Defines UVC, display, and AI-frame configuration macros |
|
Declares the UVC pipeline interface |
|
Implements UVC capture, JPEG decode, format conversion, and display insertion |
|
Declares preprocess, inference, postprocess, and drawing for face detection |
|
Implements face detection task logic |
|
Anchor data for the 320-input detection model |
|
Anchor data for the 640-input detection model |
|
Organizes the complete application logic and frame processing |
How to Use and Modify:
ai_base.*andscoped_timing.h- Implement the model inference wrapper and timing tools. These files typically do not require changes and are reused across projects.ai_utils.*- Provides common utility functions. Extend these only if existing helpers are insufficient.setting.h,uvc_pipeline.h, anduvc_pipeline.cc- Handle UVC capture, JPEG decoding, format conversion vianoai_2d, frame dump, and display insertion. These typically only need changes if you:Add a new display type or resolution
Change UVC input resolution or format
Modify frame dimensions or color space conversions
The current implementation supports both
LT9611 HDMI 1920×1080andST7701 LCD 800×480displays.face_detection.*andmain.cc- These are the main files users modify when developing a new UVC + AI application. Users typically:Implement task-specific preprocessing, inference, postprocessing, and drawing logic
Modify
main.ccto orchestrate the complete UVC-to-display pipelineReplace face detection with their own detection or recognition tasks
Code Details#
setting.h Configuration#
The macros in setting.h configure UVC output, display output, and AI-frame resolution.
Macro |
Description |
|---|---|
|
UVC output width |
|
UVC output height |
|
|
|
Display width |
|
Display height |
|
|
|
AI frame width |
|
AI frame height |
|
AI frame channels |
Typical configuration fragments are:
#define UVC_WIDTH 640
#define UVC_HEIGHT 480
The current sample supports 1920 x 1080 and 640 x 480 UVC output. Both can be displayed on HDMI and LCD targets.
#define DISPLAY_MODE 1
#define DISPLAY_WIDTH 640
#define DISPLAY_HEIGHT 480
#define DISPLAY_ROTATE 1
These parameters configure the display resolution.
#define AI_FRAME_WIDTH 640
#define AI_FRAME_HEIGHT 480
#define AI_FRAME_CHANNEL 3
The AI-frame size must match the UVC output resolution. The source frame arrives as JPEG, is decoded into YUV420, then converted by noai_2d into RGB888 in HWC layout so it can be used as model input.
AIBase Notes#
AIBase in ai_base.h is the common wrapper class used for model inference. It includes model initialization, input/output shape handling, tensor initialization, inference execution, and output retrieval.
/**
* @brief AI base class, wraps nncase-related operations.
* Later application development mainly needs to focus on preprocess and postprocess.
*/
class AIBase
{
public:
/**
* @brief Constructor. Loads the kmodel and initializes model inputs and outputs.
* @param kmodel_file Path to the kmodel file
* @param model_name Model name
* @param debug_mode 0: no debug, 1: timing only, 2: full debug logs
*/
AIBase(const char *kmodel_file, const string model_name, const int debug_mode = 1);
~AIBase();
runtime_tensor get_input_tensor(size_t idx);
void set_input_tensor(size_t idx, runtime_tensor &input_tensor);
void run();
void get_output();
runtime_tensor get_output_tensor(int idx);
protected:
string model_name_;
int debug_mode_;
vector<float *> p_outputs_;
vector<vector<int>> input_shapes_;
vector<vector<int>> output_shapes_;
private:
void set_input_init();
void set_output_init();
interpreter kmodel_interp_;
vector<unsigned char> kmodel_vec_;
};
The most commonly reused members in application development are:
input_shapes_output_shapes_p_outputs_
For example, the pointer to the first output tensor can be accessed as:
float *output0 = p_outputs_[0];
Task Header and Source Files#
face_detection.h and face_detection.cc are the core files users usually implement themselves during secondary development.
For a new application, the task-specific header and source files are the main files you implement yourself. In a real project, you can rename them to match your task, for example person_det.h, helmet_detect.cc, or gesture_recog.h. The task class should inherit from AIBase:
class YourTask : public AIBase
That means you reuse the common inference wrapper and complete the task-specific logic yourself.
The task class is responsible for:
Module |
Need to implement |
Description |
|---|---|---|
Preprocess |
Yes |
Convert the input image to the model format |
Inference |
Reuse |
The common run path is already wrapped |
Postprocess |
Yes |
Convert raw model outputs into usable results |
Draw |
Yes |
Draw the results on the image |
A simplified task skeleton is:
typedef struct ExampleResults
{
// Define the task result structure here.
} ExampleResults;
class MyApp : public AIBase
{
public:
/**
* @brief Constructor for video inference.
* Loads the kmodel, initializes model inputs and outputs, and configures
* application-specific parameters such as thresholds and preprocess behavior.
*/
MyApp(char *kmodel_file, other_params, FrameCHWSize image_size, int debug_mode);
~MyApp();
void pre_process(runtime_tensor &input_tensor);
void inference();
void post_process(FrameCHWSize image_size, vector<ExampleResults> &results);
void draw_result(cv::Mat &draw_frame, vector<ExampleResults> &results);
std::unique_ptr<ai2d_builder> ai2d_builder_;
runtime_tensor ai2d_out_tensor_;
FrameCHWSize image_size_;
FrameCHWSize input_size_;
// Add task-specific members here when needed.
};
You can follow the implementation under src/rtsmart/examples/ai/uvc_face_detection/src/face_detection.cc.
main.cc Changes#
Flow Overview#
main.cc implements the complete task flow:
get one frame from the UVC camera
create the input tensor
call preprocess
call inference
call postprocess
draw the result
release the frame
The video loop can be summarized as:
FrameCHWSize image_size = {AI_FRAME_CHANNEL, AI_FRAME_HEIGHT, AI_FRAME_WIDTH};
dims_t in_shape {1, AI_FRAME_CHANNEL, AI_FRAME_HEIGHT, AI_FRAME_WIDTH};
runtime_tensor input_tensor = host_runtime_tensor::create(typecode_t::dt_uint8, in_shape, hrt::pool_shared).expect("cannot create input tensor");
auto input_buf = input_tensor.impl()->to_host().unwrap()->buffer().as_host().unwrap().map(map_access_::map_write).unwrap().buffer();
UVC_PipeLine pl(debug_mode);
pl.Create();
DumpRes dump_res;
MyApp my_app(argv[1], atof(argv[2]), atof(argv[3]), image_size, atoi(argv[5]));
vector<ExampleResults> results;
std::vector<uint8_t> chw_vec;
std::vector<cv::Mat> rgbChannels(3);
cv::Mat ori_img;
int ret = 0;
while (!isp_stop)
{
ScopedTiming st("total time", 1);
ret = pl.GetFrame(dump_res);
if (ret)
{
printf("GetFrame fail\n");
continue;
}
{
ScopedTiming st("create tensor", debug_mode);
void *vaddr = reinterpret_cast<void *>(dump_res.virt_addr);
ori_img = cv::Mat(image_size.height, image_size.width, CV_8UC3, vaddr);
chw_vec.clear();
rgbChannels.clear();
cv::split(ori_img, rgbChannels);
for (auto i = 0; i < 3; i++)
{
std::vector<uint8_t> data = std::vector<uint8_t>(rgbChannels[i].reshape(1, 1));
chw_vec.insert(chw_vec.end(), data.begin(), data.end());
}
memcpy(reinterpret_cast<char *>(input_buf.data()), chw_vec.data(), chw_vec.size());
hrt::sync(input_tensor, sync_op_t::sync_write_back, true).expect("write back input failed");
}
results.clear();
my_app.pre_process(input_tensor);
my_app.inference();
my_app.post_process(image_size, results);
my_app.draw_result(ori_img, results);
pl.ReleaseFrame(dump_res);
}
pl.Destroy();
Compared with the MIPI + OSD flow, the UVC sample draws the result directly on the original image instead of drawing on a transparent OSD frame. This is one of the main architectural differences between the UVC path and the MIPI-camera path.
The video loop usually runs in a dedicated thread. When the user inputs q, set isp_stop to true so the thread can exit.
CMakeLists.txt and build_app.sh#
At the source root:
add_subdirectory(src)
For the task subdirectory:
set(src main.cc face_detection.cc anchors_320.cc anchors_640.cc ai_base.cc ai_utils.cc uvc_pipeline.cc)
set(bin uvc_face_detection.elf)
The build script should collect the generated elf and utility files into k230_bin, for example:
collect_outputs() {
local elf_file="${BUILD_DIR}/bin/uvc_face_detection.elf"
if [ -f "${elf_file}" ]; then
echo "[INFO] Collecting ELF and utility files to ${K230_BIN_DIR}..."
cp -u "${elf_file}" "${K230_BIN_DIR}/"
cp -u utils/* "${K230_BIN_DIR}/" 2>/dev/null || true
else
echo "[WARN] ELF file not found: ${elf_file}"
fi
}
Build#
Select Board and Build Firmware#
From the RTOS SDK root:
make list-def
make ***_defconfig
make -j
After the firmware build finishes, the image is generated in the output directory.
Build Method 1#
After you finish the code changes, go to the directory containing build_app.sh and run:
./build_app.sh
The build intermediates are generated in build, and the deployment package is collected in k230_bin.
Build Method 2#
From the RTOS SDK root, run make menuconfig and enable:
RT-Smart UserSpace Examples Configuration
-> Enable build ai examples
-> Enable Build UVC+AI Programs
Then run:
make -j
With this method, the files are built directly into:
/sdcard/app/examples/ai/uvc_face_detection
You can also enter the example directory and run:
make -j
This path also supports incremental build and places the collected outputs in k230_bin.
Board Deployment#
Flash the firmware first. See:
After boot, a virtual disk named CanMV is visible. Copy the generated files from k230_bin to:
CanMV/sdcard
This typically includes:
the generated
elfthe required
kmodelany extra files used by the sample
Then connect to the board through a serial console and run:
uvc_face_detect_isp.sh
Make sure the argument order and file paths match your code and deployment layout.
The deployment effect is shown below:
