5.4 YOLO Module API Manual#

1. Overview#

This manual aims to guide developers in deploying and using models trained and converted with YOLOv5, YOLOv8, and YOLO11 through the YOLO module. The models support three types of tasks: classification, detection, and segmentation. It helps users quickly integrate with the YOLO source code and deploy the trained models on the K230. For YOLO usage examples, refer to the documentation: YOLO Battle.

2. API Introduction#

2.1 YOLOv5 Class#

2.1.1 Constructor#

Description This is the constructor of the encapsulated YOLOv5 module, which initializes a YOLOv5 type to obtain a YOLOv5 instance.

Syntax

from libs.YOLO import YOLOv5

yolo=YOLOv5(task_type="classify",mode="image",kmodel_path="yolov5_det.kmodel",labels=["apple","banana","orange"],rgb888p_size=[1280,720],model_input_size=[320,320],conf_thresh=0.5,nms_thresh=0.25,max_boxes_num=50,debug_mode=0)

Parameters

Parameter Name

Description

Explanation

Type

task_type

Task type

Supports three types of tasks, with options ‘classify’/’detect’/’segment’.

str

mode

Inference mode

Supports two inference modes, with options ‘image’/’video’. ‘image’ indicates image inference, and ‘video’ indicates real - time video stream inference from the camera.

str

kmodel_path

kmodel path

The path of the kmodel copied to the development board.

str

labels

Class label list

The label names of different classes.

list[str]

rgb888p_size

Inference frame resolution

The resolution of the current inference frame, such as [1920,1080], [1280,720], [640,640].

list[int]

model_input_size

Model input resolution

The input resolution during YOLOv5 model training, such as [224,224], [320,320], [640,640].

list[int]

display_size

Display resolution

Set when the inference mode is ‘video’, supporting hdmi ([1920,1080]) and lcd ([800,480]).

list[int]

conf_thresh

Confidence threshold

The class confidence threshold for classification tasks and the target confidence threshold for detection and segmentation tasks, such as 0.5.

float [0~1]

nms_thresh

NMS threshold

The non - maximum suppression threshold, required for detection and segmentation tasks.

float [0~1]

mask_thresh

Mask threshold

The binary threshold for segmenting objects in the detection box in segmentation tasks.

float [0~1]

max_boxes_num

Maximum number of detection boxes

The maximum number of detection boxes allowed to be returned in one frame of image.

int

debug_mode

Debug mode

Whether the timing function is enabled, with options 0/1. 0 means no timing, and 1 means timing.

int [0/1]

Return Value

Return Value

Description

YOLOv5

YOLOv5 instance

2.1.2 config_preprocess#

Description This is the YOLOv5 pre - processing configuration function.

Syntax

yolo.config_preprocess()

Parameters

Parameter Name

Description

Input/Output

Explanation

None

Return Value

Return Value

Description

None

2.1.3 run#

Description Infer one frame of image and return the inference result for use in the draw_result method. For classification tasks, it returns the class index and score. For detection tasks, it returns a list of detection box positions, scores, and class indices. For segmentation tasks, it returns the mask result and a list of detection box positions, scores, and class indices.

Syntax

res=yolo.run(img)

Parameters

Parameter Name

Description

Input/Output

Explanation

img

The image to be inferred in the format of ulab.numpy.ndarray, or one frame of image obtained from the video stream through get_frame.

Input

Return Value

Return Value

Description

res

The post - processing result of the model. The return values vary for different tasks. For classification tasks, it returns the class index and score. For detection tasks, it returns a list of detection box positions, scores, and class indices. For segmentation tasks, it returns the mask result and a list of detection box positions, scores, and class indices.

2.1.4 draw_result#

Description Draw the YOLOv5 inference result on the screen or image.

Syntax

yolo.draw_result(res,img_ori)

Parameters

Parameter Name

Description

Input/Output

Explanation

res

The inference result of YOLOv5

Input

img_ori

The Image instance to be drawn on

Input

From read_img or pl.osd_img

Return Value

Return Value

Description

None

2.1.5 Example Programs#

The following is an example program for the YOLOv5 detection task:

from libs.YOLO import YOLOv5
from libs.Utils import *
import os, sys, gc
import ulab.numpy as np
import image

if __name__ == "__main__":
    # The following are examples only. Please modify them according to your own test image, model path, label names, and model input size.
    img_path = "/sdcard/examples/utils/test_fruit.jpg"
    kmodel_path = "/sdcard/examples/kmodel/fruit_det_yolov5n_320.kmodel"
    labels = ["apple", "banana", "orange"]
    model_input_size = [320, 320]

    confidence_threshold = 0.5
    nms_threshold = 0.45
    img, img_ori = read_image(img_path)
    rgb888p_size = [img.shape[2], img.shape[1]]
    # Initialize YOLOv5 instance
    yolo = YOLOv5(task_type="detect", mode="image", kmodel_path=kmodel_path, labels=labels, rgb888p_size=rgb888p_size, model_input_size=model_input_size, conf_thresh=confidence_threshold, nms_thresh=nms_threshold, max_boxes_num=50, debug_mode=0)
    yolo.config_preprocess()
    res = yolo.run(img)
    yolo.draw_result(res, img_ori)
    yolo.deinit()
    gc.collect()

The above code shows how to use YOLOv5 for image inference.

from libs.PipeLine import PipeLine
from libs.YOLO import YOLOv5
from libs.Utils import *
import os, sys, gc
import ulab.numpy as np
import image

if __name__ == "__main__":
    # The following are examples only. Please modify them according to your own model path, label names, and model input size.
    kmodel_path = "/sdcard/examples/kmodel/fruit_det_yolov5n_320.kmodel"
    labels = ["apple", "banana", "orange"]
    model_input_size = [320, 320]

    # Add display mode, default is hdmi, options are hdmi/lcd/lt9611/st7701/hx8399.
    # hdmi defaults to lt9611 with resolution 1920*1080; lcd defaults to st7701 with resolution 800*480.
    display_mode = "lcd"
    rgb888p_size = [640, 360]
    confidence_threshold = 0.8
    nms_threshold = 0.45
    pl = PipeLine(rgb888p_size=rgb888p_size, display_mode=display_mode)
    pl.create()
    display_size = pl.get_display_size()
    # Initialize YOLOv5 instance
    yolo = YOLOv5(task_type="detect", mode="video", kmodel_path=kmodel_path, labels=labels, rgb888p_size=rgb888p_size, model_input_size=model_input_size, display_size=display_size, conf_thresh=confidence_threshold, nms_thresh=nms_threshold, max_boxes_num=50, debug_mode=0)
    yolo.config_preprocess()
    while True:
        with ScopedTiming("total", 1):
            img = pl.get_frame()
            res = yolo.run(img)
            yolo.draw_result(res, pl.osd_img)
            pl.show_image()
            gc.collect()
    yolo.deinit()
    pl.destroy()

The above code shows how to use YOLOv5 for video inference.

2.2 YOLOv8 Class#

2.2.1 Constructor#

Description This is the constructor of the encapsulated YOLOv8 module, which initializes a YOLOv8 type to obtain a YOLOv8 instance.

Syntax

from libs.YOLO import YOLOv8

yolo=YOLOv8(task_type="classify",mode="image",kmodel_path="yolov8_det.kmodel",labels=["apple","banana","orange"],rgb888p_size=[1280,720],model_input_size=[320,320],conf_thresh=0.5,nms_thresh=0.25,max_boxes_num=50,debug_mode=0)

Parameters

Parameter Name

Description

Explanation

Type

task_type

Task type

Supports four types of tasks, with options ‘classify’/’detect’/’segment’/’obb’.

str

mode

Inference mode

Supports two inference modes, with options ‘image’/’video’. ‘image’ indicates image inference, and ‘video’ indicates real - time video stream inference from the camera.

str

kmodel_path

kmodel path

The path of the kmodel copied to the development board.

str

labels

Class label list

The label names of different classes.

list[str]

rgb888p_size

Inference frame resolution

The resolution of the current inference frame, such as [1920,1080], [1280,720], [640,640].

list[int]

model_input_size

Model input resolution

The input resolution during YOLOv8 model training, such as [224,224], [320,320], [640,640].

list[int]

display_size

Display resolution

Set when the inference mode is ‘video’, supporting hdmi ([1920,1080]) and lcd ([800,480]).

list[int]

conf_thresh

Confidence threshold

The class confidence threshold for classification tasks and the target confidence threshold for detection and segmentation tasks, such as 0.5.

float [0~1]

nms_thresh

NMS threshold

The non - maximum suppression threshold, required for detection and segmentation tasks.

float [0~1]

mask_thresh

Mask threshold

The binary threshold for segmenting objects in the detection box in segmentation tasks.

float [0~1]

max_boxes_num

Maximum number of detection boxes

The maximum number of detection boxes allowed to be returned in one frame of image.

int

debug_mode

Debug mode

Whether the timing function is enabled, with options 0/1. 0 means no timing, and 1 means timing.

int [0/1]

Return Value

Return Value

Description

YOLOv8

YOLOv8 instance

2.2.2 config_preprocess#

Description This is the YOLOv8 pre - processing configuration function.

Syntax

yolo.config_preprocess()

Parameters

Parameter Name

Description

Input/Output

Explanation

None

Return Value

Return Value

Description

None

2.2.3 run#

Description Infer one frame of image and return the inference result for use in the draw_result method. For classification tasks, it returns the class index and score. For detection tasks, it returns a list of detection box positions, scores, and class indices. For segmentation tasks, it returns the mask result and a list of detection box positions, scores, and class indices.

Syntax

res=yolo.run(img)

Parameters

Parameter Name

Description

Input/Output

Explanation

img

The image to be inferred in the format of ulab.numpy.ndarray, or one frame of image obtained from the video stream through get_frame.

Input

Return Value

Return Value

Description

res

The post - processing result of the model. The return values vary for different tasks. For classification tasks, it returns the class index and score. For detection tasks, it returns a list of detection box positions, scores, and class indices. For segmentation tasks, it returns the mask result and a list of detection box positions, scores, and class indices.

2.2.4 draw_result#

Description Draw the YOLOv8 inference result on the screen or image.

Syntax

yolo.draw_result(res,img_ori)

Parameters

Parameter Name

Description

Input/Output

Explanation

res

The inference result of YOLOv8

Input

img_ori

The Image instance to be drawn on

Input

From read_img or pl.osd_img

Return Value

Return Value

Description

None

2.2.5 Example Programs#

The following is an example program for the YOLOv8 classification task:

from libs.YOLO import YOLOv8
from libs.Utils import *
import os, sys, gc
import ulab.numpy as np
import image

if __name__ == "__main__":
    # The following are examples only. Please modify them according to your own test image, model path, label names, and model input size.
    img_path = "/sdcard/examples/utils/test_fruit.jpg"
    kmodel_path = "/sdcard/examples/kmodel/fruit_det_yolov8n_320.kmodel"
    labels = ["apple", "banana", "orange"]
    model_input_size = [320, 320]

    confidence_threshold = 0.5
    nms_threshold = 0.45
    img, img_ori = read_image(img_path)
    rgb888p_size = [img.shape[2], img.shape[1]]
    # Initialize YOLOv8 instance
    yolo = YOLOv8(task_type="detect", mode="image", kmodel_path=kmodel_path, labels=labels, rgb888p_size=rgb888p_size, model_input_size=model_input_size, conf_thresh=confidence_threshold, nms_thresh=nms_threshold, max_boxes_num=50, debug_mode=0)
    yolo.config_preprocess()
    res = yolo.run(img)
    yolo.draw_result(res, img_ori)
    yolo.deinit()
    gc.collect()

The above code shows how to use YOLOv8 for image inference.

from libs.PipeLine import PipeLine
from libs.YOLO import YOLOv8
from libs.Utils import *
import os, sys, gc
import ulab.numpy as np
import image

if __name__ == "__main__":
    # The following are examples only. Please modify them according to your own model path, label names, and model input size.
    kmodel_path = "/sdcard/examples/kmodel/fruit_det_yolov8n_320.kmodel"
    labels = ["apple", "banana", "orange"]
    model_input_size = [320, 320]

    # Add display mode, default is hdmi, options are hdmi/lcd/lt9611/st7701/hx8399.
    # hdmi defaults to lt9611 with resolution 1920*1080; lcd defaults to st7701 with resolution 800*480.
    display_mode = "lcd"
    rgb888p_size = [640, 360]
    confidence_threshold = 0.5
    nms_threshold = 0.45
    # Initialize PipeLine
    pl = PipeLine(rgb888p_size=rgb888p_size, display_mode=display_mode)
    pl.create()
    display_size = pl.get_display_size()
    # Initialize YOLOv8 instance
    yolo = YOLOv8(task_type="detect", mode="video", kmodel_path=kmodel_path, labels=labels, rgb888p_size=rgb888p_size, model_input_size=model_input_size, display_size=display_size, conf_thresh=confidence_threshold, nms_thresh=nms_threshold, max_boxes_num=50, debug_mode=0)
    yolo.config_preprocess()
    while True:
        with ScopedTiming("total", 1):
            # Process each frame
            img = pl.get_frame()
            res = yolo.run(img)
            yolo.draw_result(res, pl.osd_img)
            pl.show_image()
            gc.collect()
    yolo.deinit()
    pl.destroy()

The above code shows how to use YOLOv8 for video inference.

2.3 YOLO11 Class#

2.3.1 Constructor#

Description This is the constructor of the encapsulated YOLO11 module, which initializes the YOLO11 type to obtain a YOLO11 instance.

Syntax

from libs.YOLO import YOLO11

yolo=YOLO11(task_type="segment",mode="image",kmodel_path="yolo11_det.kmodel",labels=["apple","banana","orange"],rgb888p_size=[1280,720],model_input_size=[320,320],conf_thresh=0.5,nms_thresh=0.25,mask_thresh=0.5,max_boxes_num=50,debug_mode=0)

Parameters

Parameter Name

Description

Explanation

Type

task_type

Task type

Supports four types of tasks, with available options: ‘classify’, ‘detect’, ‘segment’, ‘obb’.

str

mode

Inference mode

Supports two inference modes, with available options: ‘image’ and ‘video’. ‘image’ means inferring images, and ‘video’ means inferring real - time video streams captured by the camera.

str

kmodel_path

kmodel path

The path of the kmodel copied to the development board.

str

labels

Class label list

The label names of different classes.

list[str]

rgb888p_size

Inference frame resolution

The resolution of the current inference frame, such as [1920, 1080], [1280, 720], [640, 640].

list[int]

model_input_size

Model input resolution

The input resolution during the training of the YOLO11 model, such as [224, 224], [320, 320], [640, 640].

list[int]

display_size

Display resolution

Set when the inference mode is ‘video’, supporting HDMI ([1920, 1080]) and LCD ([800, 480]).

list[int]

conf_thresh

Confidence threshold

The class confidence threshold for classification tasks and the target confidence threshold for detection and segmentation tasks, e.g., 0.5.

float [0~1]

nms_thresh

NMS threshold

The non - maximum suppression threshold, required for detection and segmentation tasks.

float [0~1]

mask_thresh

Mask threshold

The binary threshold for segmenting objects within the detection box in segmentation tasks.

float [0~1]

max_boxes_num

Maximum number of detection boxes

The maximum number of detection boxes allowed to be returned in one frame of an image.

int

debug_mode

Debug mode

Whether the timing function is enabled. Options are 0 or 1, where 0 means no timing and 1 means timing.

int [0/1]

Return Value

Return Value

Description

YOLO11

YOLO11 instance

2.3.2 config_preprocess#

Description This is the YOLO11 pre - processing configuration function.

Syntax

yolo.config_preprocess()

Parameters

Parameter Name

Description

Input/Output

Explanation

None

Return Value

Return Value

Description

None

2.3.3 run#

Description Infer one frame of an image and return the inference result for use in the draw_result method. For classification tasks, it returns the class index and score. For detection tasks, it returns a list of detection box positions, scores, and class indices. For segmentation tasks, it returns the mask result and a list of detection box positions, scores, and class indices.

Syntax

res=yolo.run(img)

Parameters

Parameter Name

Description

Input/Output

Explanation

img

The image to be inferred in the format of ulab.numpy.ndarray, or one frame of an image obtained from the video stream through get_frame.

Input

Return Value

Return Value

Description

res

The post - processing result of the model. The return values vary for different tasks. For classification tasks, it returns the class index and score. For detection tasks, it returns a list of detection box positions, scores, and class indices. For segmentation tasks, it returns the mask result and a list of detection box positions, scores, and class indices.

2.3.4 draw_result#

Description Draw the YOLO11 inference result on the screen or an image.

Syntax

yolo.draw_result(res,img_ori)

Parameters

Parameter Name

Description

Input/Output

Explanation

res

The inference result of YOLO11

Input

img_ori

The Image instance to be drawn on

Input

From read_img or pl.osd_img

Return Value

Return Value

Description

None

2.3.5 Example Programs#

The following is an example program for the YOLO11 segmentation task:

from libs.YOLO import YOLO11
from libs.Utils import *
import os, sys, gc
import ulab.numpy as np
import image

if __name__ == "__main__":
    # The following are examples only. Please modify them according to your own test image, model path, label names, and model input size.
    img_path = "/sdcard/examples/utils/test_fruit.jpg"
    kmodel_path = "/sdcard/examples/kmodel/fruit_det_yolo11n_320.kmodel"
    labels = ["apple", "banana", "orange"]
    model_input_size = [320, 320]

    confidence_threshold = 0.5
    nms_threshold = 0.45
    img, img_ori = read_image(img_path)
    rgb888p_size = [img.shape[2], img.shape[1]]
    # Initialize YOLO11 instance
    yolo = YOLO11(task_type="detect", mode="image", kmodel_path=kmodel_path, labels=labels, rgb888p_size=rgb888p_size, model_input_size=model_input_size, conf_thresh=confidence_threshold, nms_thresh=nms_threshold, max_boxes_num=50, debug_mode=0)
    yolo.config_preprocess()
    res = yolo.run(img)
    yolo.draw_result(res, img_ori)
    yolo.deinit()
    gc.collect()

The above code shows how to use YOLO11 for image inference.

from libs.PipeLine import PipeLine
from libs.YOLO import YOLO11
from libs.Utils import *
import os, sys, gc
import ulab.numpy as np
import image

if __name__ == "__main__":
    # The following are examples only. Please modify them according to your own model path, label names, and model input size.
    kmodel_path = "/sdcard/examples/kmodel/fruit_det_yolo11n_320.kmodel"
    labels = ["apple", "banana", "orange"]
    model_input_size = [320, 320]

    # Add display mode, default is hdmi, options are hdmi/lcd/lt9611/st7701/hx8399.
    # hdmi defaults to lt9611 with resolution 1920*1080; lcd defaults to st7701 with resolution 800*480.
    display_mode = "lcd"
    rgb888p_size = [640, 360]
    confidence_threshold = 0.5
    nms_threshold = 0.45
    # Initialize PipeLine
    pl = PipeLine(rgb888p_size=rgb888p_size, display_mode=display_mode)
    pl.create()
    display_size = pl.get_display_size()
    # Initialize YOLO11 instance
    yolo = YOLO11(task_type="detect", mode="video", kmodel_path=kmodel_path, labels=labels, rgb888p_size=rgb888p_size, model_input_size=model_input_size, display_size=display_size, conf_thresh=confidence_threshold, nms_thresh=nms_threshold, max_boxes_num=50, debug_mode=0)
    yolo.config_preprocess()
    while True:
        with ScopedTiming("total", 1):
            # Process each frame
            img = pl.get_frame()
            res = yolo.run(img)
            yolo.draw_result(res, pl.osd_img)
            pl.show_image()
            gc.collect()
    yolo.deinit()
    pl.destroy()

The above code shows how to use YOLO11 for video inference.

Comments list

Comments list

Comments
Log in