Note

This is the documentation for the latest development branch and may refer to features that are not available in released versions. If you are looking for the documentation for a specific release, use the drop-down menu on the left and select the desired version.

nncase Model Compilation API Manual#

Overview#

nncase is a neural network compiler designed for AI accelerators. The API given in this document is python API used by users to convert the trained TFLite model or ONNX model into a model format that can be accelerated by kpu, that is, kmodel. Currently, the compiled model APIs support deep learning models in formats such as TFLite/ONNX. The API provided in this document is used to compile kmodel on the local PC, and is not the code to run on k230. For learning about nncase, please refer to: nncase github repo.

API introduction#

CompileOptions#

Description

CompileOptions class, used to configure nncase compilation options. Each attribute is described as follows:

Property name

type

Is it necessary

Description

target

string

yes

Specify the compilation target, such as ‘cpu’, ‘k230’

dump_ir

bool

no

Specify whether to dump IR, the default is False

dump_asm

bool

no

Specifies whether to dump asm assembly files, the default is False

dump_dir

string

no

After specifying dump_ir and other switches earlier, specify the dump directory here. The default is “”

input_file

string

no

When the ONNX model exceeds 2GB, it is used to specify the parameter file path. The default is “”

preprocess

bool

no

Whether to enable pre-processing, the default is False. The following parameters only take effect when preprocess=True

input_type

string

no

Specify the input data type when turning on pre-processing, the default is “float”. When preprocess is True, it must be specified as “uint8” or “float32”

input_shape

list[int]

no

Specify the shape of the input data when turning on pre-processing, the default is []. When preprocess is True, it must be specified

input_range

list[float]

no

Specify the floating point number range after dequantization of the input data when preprocessing is enabled. The default is [ ]. Must be specified when preprocess is True and input_type is uint8

input_layout

string

no

Specify the layout of the input data, the default is “”

swapRB

bool

no

Whether to reverse the data in the channel dimension, the default is False

mean

list[float]

no

The mean value of preprocessing standardized parameters, the default is [0,0,0]

std

list[float]

no

Preprocessing standardized parameter variance, default is [1,1,1]

letterbox_value

float

no

Specify the filling value of the pre-processing letterbox, the default is 0

output_layout

string

no

Specify the layout of the output data, the default is “”

shape_bucket_enable

bool

yes

Whether to enable the ShapeBucket function, the default is False. Effective at dump_ir=True

shape_bucket_range_info

Dict[str, [int, int]]

yes

The range of variables in each input shape dimension information, the minimum value must be greater than or equal to 1

shape_bucket_segments_count

int

yes

The scope of the input variable is divided into several segments

shape_bucket_fix_var_map

Dict[str, int]

no

Fixed variables in shape dimension information to specific values

Pre-processing process description#

Currently, custom pre-processing sequence is not supported. You can select the required pre-processing parameters for configuration according to the following process diagram.

graph TD; NewInput("NewInput
(shape = input_shape
dtype = input_type)") -->a(input_layout != ' ')-.Y.->Transpose1["transpose"] -.->b("SwapRB == True")-.Y.->SwapRB["SwapRB"]-.->c("input_type != float32")-.Y.->Dequantize["Dequantize"]-.->d("input_HW != model_HW")-.Y.->LetterBox["LetterBox"] -.->e("std not empty
mean not empty")-.Y.->Normalization["Normalization"]-.->OldInput-->Model_body-->OldOutput-->f("output_layout != ' '")-.Y.->Transpose2["Transpose"]-.-> NewOutput; a--N-->b--N-->c--N-->d--N-->e--N-->OldInput; f--N-->NewOutput; subgraph origin_model OldInput; Model_body ; OldOutput; end

Parameter description:

  1. input_range is the range of floating point numbers after dequantization when the input data type is fixed point.

a. The input data type is uint8, the range is [0,255], and input_range is [0,255]. The function of dequantization is only type conversion, converting the uint8 data into float32. The mean and std parameters are still specified according to the data of [0,255].

b. If the input data type is uint8, the range is [0,255], and input_range is [0,1], then inverse quantization will convert the fixed-point number into a floating-point number [0,1]. The mean and std parameters need to be specified according to the data of 0~1.

graph TD; NewInput_uint8("NewInput_uint8
[input_type:uint8]") --input_range:0,255 -->dequantize_0["Dequantize"]--float range:0,255--> OldInput_float32 NewInput_uint81("NewInput_uint8
[input_type:uint8]") --input_range:0,1 -->dequantize_1["Dequantize"]--float range:0,1--> OldInput_float32
  1. input_shape is the shape of the input data, and the layout is input_layout. It now supports string ("NHWC", "NCHW") and index methods as input_layout, and supports non-4D data processing. When input_layout is configured in string form, it indicates the layout of the input data; when input_layout is configured in index form, it indicates that the input data will be transposed according to the currently configured input_layout, that is, input_layout is the perm parameter of Transpose.

graph TD; subgraph B NewInput1("NewInput: 1,4,10") --"input_layout:"0,2,1""-->Transpose2("Transpose perm: 0,2,1") --> OldInput2("OldInput: 1,10,4"); end subgraph A NewInput --"input_layout:"NHWC""--> Transpose0("Transpose: NHWC2NCHW") --> OldInput; NewInput("NewInput: 1,224,224,3 (NHWC)") --"input_layout:"0,3,1,2""--> Transpose1("Transpose perm: 0,3,1,2") --> OldInput("OldInput: 1,3,224,224 (NCHW)"); end

​ The same goes for output_layout, as shown in the figure below.

graph TD; subgraph B OldOutput1("OldOutput: 1,10,4,5,2") --"output_layout: "0,2,3,1,4""--> Transpose5("Transpose perm: 0,2,3,1,4") --> NewOutput1("NewOutput: 1,4,5,10,2"); end subgraph A OldOutput --"output_layout: "NHWC""--> Transpose3("Transpose: NCHW2NHWC") --> NewOutput("NewOutput
NHWC"); OldOutput("OldOutput: (NCHW)") --"output_layout: "0,2,3,1""--> Transpose4("Transpose perm: 0,2,3,1") --> NewOutput("NewOutput
NHWC"); end

Dynamic shape parameter description#

ShapeBucket is a solution for dynamic shapes that optimizes dynamic shapes based on the input length range and the number of specified segments. This function defaults to false, and the corresponding option needs to be turned on to take effect. Except for specifying the corresponding field information, the other processes are no different from compiling a static model.

  • ONNX

Some dimensions in the shape of the model are variable names. Here we take the input of an ONNX model as an example.

tokens: int64[batch_size, tgt_seq_len] step: float32[seq_len, batch_size]

There are three variables seq_len, tgt_seq_len, and batch_size in the dimension information of shape. The first is batch_size. Although it is a variable, it is fixed to 3 in actual application. Therefore, adding batch_size = 3 to fix_var_map will fix this dimension to 3 during operation. seq_len and tgt_seq_len will actually change, so you need to configure the actual range of these two variables, which is the range_info information. segments_count is the actual number of segments, which will be divided into several equal parts according to the range, and the corresponding compilation time will also increase several times accordingly.

The following are examples of corresponding compilation parameters:

compile_options = nncase.CompileOptions()
compile_options.shape_bucket_enable = True
compile_options.shape_bucket_range_info = {"seq_len": [1, 100], "tgt_seq_len": [1, 100]}
compile_options.shape_bucket_segments_count = 2
compile_options.shape_bucket_fix_var_map = {"batch_size": 3}
  • TFLite

TFLite’s model is different from ONNX. The name of the dimension is not marked on the shape. Currently, only one dimension in the input is supported to be dynamic, and the name is uniformly configured as -1. The configuration method is as follows:

compile_options = nncase.CompileOptions()
compile_options.shape_bucket_enable = True
compile_options.shape_bucket_range_info = {"-1":[1, 100]}
compile_options.shape_bucket_segments_count = 2
compile_options.shape_bucket_fix_var_map = {"batch_size" : 3}

After configuring these options, the entire compilation process is consistent with the static shape.

Parameter configuration example#

Instantiate CompileOptions and configure the values ​​of each attribute.

compile_options = nncase.CompileOptions()

compile_options.target = "cpu" #"k230"
compile_options.dump_ir = True  # if False, will not dump the compile-time result.
compile_options.dump_asm = True
compile_options.dump_dir = "dump_path"
compile_options.input_file = ""

# preprocess args
compile_options.preprocess = False
if compile_options.preprocess:
    compile_options.input_type = "uint8"  # "uint8" "float32"
    compile_options.input_shape = [1,224,320,3]
    compile_options.input_range = [0,1]
    compile_options.input_layout = "NHWC" # "NHWC" ”NCHW“
    compile_options.swapRB = False
    compile_options.mean = [0,0,0]
    compile_options.std = [1,1,1]
    compile_options.letterbox_value = 0
    compile_options.output_layout = "NHWC" # "NHWC" "NCHW"

# Dynamic shape args
compile_options.shape_bucket_enable = False
if compile_options.shape_bucket_enable:
    compile_options.shape_bucket_range_info = {"seq_len": [1, 100], "tgt_seq_len": [1, 100]}
    compile_options.shape_bucket_segments_count = 2
    compile_options.shape_bucket_fix_var_map = {"batch_size": 3}

ImportOptions#

Description

ImportOptions class, used to configure nncase import options.

definition

class ImportOptions:
    def __init__(self) -> None:
        pass

Example

Instantiate ImportOptions and configure the values ​​of each attribute.

#import_options
import_options = nncase.ImportOptions()

PTQTensorOptions#

Description

PTQTensorOptions class, used to configure nncase PTQ options.

name

type

Is it necessary

Description

samples_count

int

no

Specifies the number of calibration sets used for quantification

calibrate_method

string

no

Specify the quantization method, optional ‘NoClip’, ‘Kld’, the default is ‘Kld’

finetune_weights_method

string

no

Specify whether to fine-tune the weights, optional ‘NoFineTuneWeights’, ‘UseSquant’, the default is ‘NoFineTuneWeights’

quant_type

string

no

Specify the data quantization type, optional ‘uint8’, ‘int8’, ‘int16’, quant_type and w_quant_type cannot be ‘int16’ at the same time

w_quant_type

string

no

Specify the weight quantization type, optional ‘uint8’, ‘int8’, ‘int16’, quant_type and w_quant_type cannot be ‘int16’ at the same time

quant_scheme

string

no

Path to import quantization parameter configuration file

quant_scheme_strict_mode

bool

no

Whether to perform quantization strictly according to quant_scheme

export_quant_scheme

bool

no

Whether to export the quantization parameter configuration file

export_weight_range_by_channel

bool

no

Whether to export the weights quantization parameter in the form of bychannel. It is recommended to set this parameter to True.

For the specific usage process of mixed quantification, see MixQuantInstructions.

Example

# ptq_options
ptq_options = nncase.PTQTensorOptions()
ptq_options.samples_count = 6
ptq_options.finetune_weights_method = "NoFineTuneWeights"
ptq_options.quant_type = "uint8"
ptq_options.w_quant_type = "uint8"
ptq_options.set_tensor_data(generate_data(input_shape, ptq_options.samples_count, args.dataset))

ptq_options.quant_scheme = ""
ptq_options.quant_scheme_strict_mode = False
ptq_options.export_quant_scheme = True
ptq_options.export_weight_range_by_channel = True

compiler.use_ptq(ptq_options)

set_tensor_data#

Description

Set tensor data and set calibration data during model conversion.

definition

    def set_tensor_data(self, data: List[List[np.ndarray]]) -> None:
        reshape_data = list(map(list, zip(*data)))
        self.cali_data = [RuntimeTensor.from_numpy(
            d) for d in itertools.chain.from_iterable(reshape_data)]

Parameters

name

type

Description

data

List[List[np.ndarray]]

Read calibration data

Return Value

None.

Example

# ptq_options
ptq_options = nncase.PTQTensorOptions()
ptq_options.samples_count = 6
ptq_options.set_tensor_data(generate_data(input_shape, ptq_options.samples_count, args.dataset))
compiler.use_ptq(ptq_options)

Compiler#

Description

Compiler class, used to compile neural network models.

definition

class Compiler:
    _target: _nncase.Target
    _session: _nncase.CompileSession
    _compiler: _nncase.Compiler
    _compile_options: _nncase.CompileOptions
    _quantize_options: _nncase.QuantizeOptions
    _module: IRModule

import_tflite#

Description

Import TFLite model.

definition

def import_tflite(self, model_content: bytes, options: ImportOptions) -> None:
    self._compile_options.input_format = "tflite"
    self._import_module(model_content)

Parameters

name

type

Description

model_content

byte[]

Read model content

import_options

ImportOptions

Import options

Return Value

None.

Example

model_content = read_model_file(model)
compiler.import_tflite(model_content, import_options)

import_onnx#

Description

Import the ONNX model.

definition

def import_onnx(self, model_content: bytes, options: ImportOptions) -> None:
    self._compile_options.input_format = "onnx"
    self._import_module(model_content)

Parameters

name

type

Description

model_content

byte[]

Read model content

import_options

ImportOptions

Import options

Return Value

None.

Example

model_content = read_model_file(model)
compiler.import_onnx(model_content, import_options)

use_ptq#

Description

Set PTQ configuration options.

  • K230 must use quantization by default.

definition

use_ptq(ptq_options)

Parameters

name

type

Description

ptq_options

PTQTensorOptions

PTQ configuration options

Return Value

None.

Example

compiler.use_ptq(ptq_options)

compile#

Description

Compile the neural network model.

definition

compile()

Parameters

None.

Return Value

None.

Example

compiler.compile()

gencode_tobytes#

Description

Generate kmodel byte stream.

definition

gencode_tobytes()

Parameters

None.

Return Value

bytes[]

Example

kmodel = compiler.gencode_tobytes()
with open(os.path.join(infer_dir, 'test.kmodel'), 'wb') as f:
    f.write(kmodel)

Example#

The model and python compilation script used in the following examples:

  • The original model file is located in the src/rtsmart/libs/nncase/examples/models directory

  • The python compilation script is located in the src/rtsmart/libs/nncase/examples/scripts directory

Compile TFLite model#

The mbv2_tflite.py script is as follows:

import os
import argparse
import numpy as np
from PIL import Image
import nncase

def read_model_file(model_file):
    with open(model_file, 'rb') as f:
        model_content = f.read()
    return model_content

def generate_data(shape, batch, calib_dir):
    img_paths = [os.path.join(calib_dir, p) for p in os.listdir(calib_dir)]
    data = []
    for i in range(batch):
        assert i < len(img_paths), "calibration images not enough."
        img_data = Image.open(img_paths[i]).convert('RGB')
        img_data = img_data.resize((shape[3], shape[2]), Image.BILINEAR)
        img_data = np.asarray(img_data, dtype=np.uint8)
        img_data = np.transpose(img_data, (2, 0, 1))
        data.append([img_data[np.newaxis, ...]])
    return data

def main():
    parser = argparse.ArgumentParser(prog="nncase")
    parser.add_argument("--target", type=str, help='target to run')
    parser.add_argument("--model", type=str, help='model file')
    parser.add_argument("--dataset", type=str, help='calibration_dataset')
    args = parser.parse_args()

    input_shape = [1, 3, 224, 224]
    dump_dir = 'tmp/mbv2_tflite'

    # compile_options
    compile_options = nncase.CompileOptions()
    compile_options.target = args.target
    compile_options.preprocess = True
    compile_options.swapRB = False
    compile_options.input_shape = input_shape
    compile_options.input_type = 'uint8'
    compile_options.input_range = [0, 255]
    compile_options.mean = [127.5, 127.5, 127.5]
    compile_options.std = [127.5, 127.5, 127.5]
    compile_options.input_layout = 'NCHW'
    compile_options.dump_ir = True
    compile_options.dump_asm = True
    compile_options.dump_dir = dump_dir

    # compiler
    compiler = nncase.Compiler(compile_options)

    # import
    model_content = read_model_file(args.model)
    import_options = nncase.ImportOptions()
    compiler.import_tflite(model_content, import_options)

    # ptq_options
    ptq_options = nncase.PTQTensorOptions()
    ptq_options.samples_count = 6
    ptq_options.set_tensor_data(generate_data(input_shape, ptq_options.samples_count, args.dataset))
    compiler.use_ptq(ptq_options)

    # compile
    compiler.compile()

    # kmodel
    kmodel = compiler.gencode_tobytes()
    with open(os.path.join(dump_dir, 'test.kmodel'), 'wb') as f:
        f.write(kmodel)

if __name__ == '__main__':
    main()

Execute the following command to compile the TFLite model of mobilenetv2, and the target is k230.

root@c285a41a7243:/mnt/# cd rtos_sdk/src/rtsmart/libs/nncase/examples
root@c285a41a7243:/mnt/rtos_sdk/src/rtsmart/libs/nncase/examples# python3 ./scripts/mbv2_tflite.py --target k230 --model models/mbv2.tflite --dataset calibration_dataset

Compile ONNX model#

For the ONNX model, it is recommended to use ONNX Simplifier to simplify it first, and then use nncase to compile.

The yolov5s_onnx.py script is as follows:

import os
import argparse
import numpy as np
from PIL import Image
import onnxsim
import onnx
import nncase

def parse_model_input_output(model_file):
    onnx_model = onnx.load(model_file)
    input_all = [node.name for node in onnx_model.graph.input]
    input_initializer = [node.name for node in onnx_model.graph.initializer]
    input_names = list(set(input_all) - set(input_initializer))
    input_tensors = [
        node for node in onnx_model.graph.input if node.name in input_names]

    # input
    inputs = []
    for _, e in enumerate(input_tensors):
        onnx_type = e.type.tensor_type
        input_dict = {}
        input_dict['name'] = e.name
        input_dict['dtype'] = onnx.mapping.TENSOR_TYPE_TO_NP_TYPE[onnx_type.elem_type]
        input_dict['shape'] = [(i.dim_value if i.dim_value != 0 else d) for i, d in zip(
            onnx_type.shape.dim, [1, 3, 224, 224])]
        inputs.append(input_dict)

    return onnx_model, inputs


def onnx_simplify(model_file, dump_dir):
    onnx_model, inputs = parse_model_input_output(model_file)
    onnx_model = onnx.shape_inference.infer_shapes(onnx_model)
    input_shapes = {}
    for input in inputs:
        input_shapes[input['name']] = input['shape']

    onnx_model, check = onnxsim.simplify(onnx_model, input_shapes=input_shapes)
    assert check, "Simplified ONNX model could not be validated"

    model_file = os.path.join(dump_dir, 'simplified.onnx')
    onnx.save_model(onnx_model, model_file)
    return model_file


def read_model_file(model_file):
    with open(model_file, 'rb') as f:
        model_content = f.read()
    return model_content

def generate_data_ramdom(shape, batch):
    data = []
    for i in range(batch):
        data.append([np.random.randint(0, 256, shape).astype(np.uint8)])
    return data


def generate_data(shape, batch, calib_dir):
    img_paths = [os.path.join(calib_dir, p) for p in os.listdir(calib_dir)]
    data = []
    for i in range(batch):
        assert i < len(img_paths), "calibration images not enough."
        img_data = Image.open(img_paths[i]).convert('RGB')
        img_data = img_data.resize((shape[3], shape[2]), Image.BILINEAR)
        img_data = np.asarray(img_data, dtype=np.uint8)
        img_data = np.transpose(img_data, (2, 0, 1))
        data.append([img_data[np.newaxis, ...]])
    return data

def main():
    parser = argparse.ArgumentParser(prog="nncase")
    parser.add_argument("--target", type=str, help='target to run')
    parser.add_argument("--model", type=str, help='model file')
    parser.add_argument("--dataset", type=str, help='calibration_dataset')

    args = parser.parse_args()

    input_shape = [1, 3, 320, 320]

    dump_dir = 'tmp/yolov5s_onnx'
    if not os.path.exists(dump_dir):
        os.makedirs(dump_dir)

    # onnx simplify
    model_file = onnx_simplify(args.model, dump_dir)

    # compile_options
    compile_options = nncase.CompileOptions()
    compile_options.target = args.target
    compile_options.preprocess = True
    compile_options.swapRB = False
    compile_options.input_shape = input_shape
    compile_options.input_type = 'uint8'
    compile_options.input_range = [0, 255]
    compile_options.mean = [0, 0, 0]
    compile_options.std = [255, 255, 255]
    compile_options.input_layout = 'NCHW'
    compile_options.output_layout = 'NCHW'
    compile_options.dump_ir = True
    compile_options.dump_asm = True
    compile_options.dump_dir = dump_dir

    # compiler
    compiler = nncase.Compiler(compile_options)

    # import
    model_content = read_model_file(model_file)
    import_options = nncase.ImportOptions()
    compiler.import_onnx(model_content, import_options)

    # ptq_options
    ptq_options = nncase.PTQTensorOptions()
    ptq_options.samples_count = 6
    ptq_options.set_tensor_data(generate_data(input_shape, ptq_options.samples_count, args.dataset))
    compiler.use_ptq(ptq_options)

    # compile
    compiler.compile()

    # kmodel
    kmodel = compiler.gencode_tobytes()
    with open(os.path.join(dump_dir, 'test.kmodel'), 'wb') as f:
        f.write(kmodel)

if __name__ == '__main__':
    main()

Execute the following command to compile the ONNX model, and the target is k230.

root@c285a41a7243:/mnt/# cd rtos_sdk/src/rtsmart/libs/nncase/examples
root@c285a41a7243: /mnt/rtos_sdk/src/rtsmart/libs/nncase/examples # python3 ./scripts/yolov5s_onnx.py --target k230 --model models/yolov5s.onnx --dataset calibration_dataset
Comments list
Comments
Log in