`nncase` Model Compilation API Manual#

Overview#

nncase is a neural network compiler designed for AI accelerators. The API given in this document is python API used by users to convert the trained TFLite model or ONNX model into a model format that can be accelerated by kpu, that is, kmodel. Currently, the compiled model APIs support deep learning models in formats such as TFLite/ONNX. The API provided in this document is used to compile kmodel on the local PC, and is not the code to run on k230. For learning about nncase, please refer to: nncase github repo.

API introduction#

CompileOptions#

Description

CompileOptions class, used to configure nncase compilation options. Each attribute is described as follows:

Property name	type	Is it necessary	Description
target	string	yes	Specify the compilation target, such as ‘cpu’, ‘k230’
dump_ir	bool	no	Specify whether to dump IR, the default is False
dump_asm	bool	no	Specifies whether to dump asm assembly files, the default is False
dump_dir	string	no	After specifying dump_ir and other switches earlier, specify the dump directory here. The default is “”
input_file	string	no	When the ONNX model exceeds 2GB, it is used to specify the parameter file path. The default is “”
preprocess	bool	no	Whether to enable pre-processing, the default is False. The following parameters only take effect when `preprocess=True`
input_type	string	no	Specify the input data type when turning on pre-processing, the default is “float”. When `preprocess` is `True`, it must be specified as “uint8” or “float32”
input_shape	list[int]	no	Specify the shape of the input data when turning on pre-processing, the default is []. When `preprocess` is `True`, it must be specified
input_range	list[float]	no	Specify the floating point number range after dequantization of the input data when preprocessing is enabled. The default is [ ]. Must be specified when `preprocess` is `True` and `input_type` is `uint8`
input_layout	string	no	Specify the layout of the input data, the default is “”
swapRB	bool	no	Whether to reverse the data in the `channel` dimension, the default is False
mean	list[float]	no	The mean value of preprocessing standardized parameters, the default is [0,0,0]
std	list[float]	no	Preprocessing standardized parameter variance, default is [1,1,1]
letterbox_value	float	no	Specify the filling value of the pre-processing letterbox, the default is 0
output_layout	string	no	Specify the layout of the output data, the default is “”
shape_bucket_enable	bool	yes	Whether to enable the ShapeBucket function, the default is False. Effective at `dump_ir=True`
shape_bucket_range_info	Dict[str, [int, int]]	yes	The range of variables in each input shape dimension information, the minimum value must be greater than or equal to 1
shape_bucket_segments_count	int	yes	The scope of the input variable is divided into several segments
shape_bucket_fix_var_map	Dict[str, int]	no	Fixed variables in shape dimension information to specific values

Pre-processing process description#

Currently, custom pre-processing sequence is not supported. You can select the required pre-processing parameters for configuration according to the following process diagram.

graph TD; NewInput("NewInput
(shape = input_shape
dtype = input_type)") -->a(input_layout != ' ')-.Y.->Transpose1["transpose"] -.->b("SwapRB == True")-.Y.->SwapRB["SwapRB"]-.->c("input_type != float32")-.Y.->Dequantize["Dequantize"]-.->d("input_HW != model_HW")-.Y.->LetterBox["LetterBox"] -.->e("std not empty
mean not empty")-.Y.->Normalization["Normalization"]-.->OldInput-->Model_body-->OldOutput-->f("output_layout != ' '")-.Y.->Transpose2["Transpose"]-.-> NewOutput; a--N-->b--N-->c--N-->d--N-->e--N-->OldInput; f--N-->NewOutput; subgraph origin_model OldInput; Model_body ; OldOutput; end

Parameter description:

input_range is the range of floating point numbers after dequantization when the input data type is fixed point.

a. The input data type is uint8, the range is [0,255], and input_range is [0,255]. The function of dequantization is only type conversion, converting the uint8 data into float32. The mean and std parameters are still specified according to the data of [0,255].

b. If the input data type is uint8, the range is [0,255], and input_range is [0,1], then inverse quantization will convert the fixed-point number into a floating-point number [0,1]. The mean and std parameters need to be specified according to the data of 0~1.

graph TD; NewInput_uint8("NewInput_uint8
[input_type:uint8]") --input_range:0,255 -->dequantize_0["Dequantize"]--float range:0,255--> OldInput_float32 NewInput_uint81("NewInput_uint8
[input_type:uint8]") --input_range:0,1 -->dequantize_1["Dequantize"]--float range:0,1--> OldInput_float32

input_shape is the shape of the input data, and the layout is input_layout. It now supports string ("NHWC", "NCHW") and index methods as input_layout, and supports non-4D data processing. When input_layout is configured in string form, it indicates the layout of the input data; when input_layout is configured in index form, it indicates that the input data will be transposed according to the currently configured input_layout, that is, input_layout is the perm parameter of Transpose.

graph TD; subgraph B NewInput1("NewInput: 1,4,10") --"input_layout:"0,2,1""-->Transpose2("Transpose perm: 0,2,1") --> OldInput2("OldInput: 1,10,4"); end subgraph A NewInput --"input_layout:"NHWC""--> Transpose0("Transpose: NHWC2NCHW") --> OldInput; NewInput("NewInput: 1,224,224,3 (NHWC)") --"input_layout:"0,3,1,2""--> Transpose1("Transpose perm: 0,3,1,2") --> OldInput("OldInput: 1,3,224,224 (NCHW)"); end

The same goes for output_layout, as shown in the figure below.

graph TD; subgraph B OldOutput1("OldOutput: 1,10,4,5,2") --"output_layout: "0,2,3,1,4""--> Transpose5("Transpose perm: 0,2,3,1,4") --> NewOutput1("NewOutput: 1,4,5,10,2"); end subgraph A OldOutput --"output_layout: "NHWC""--> Transpose3("Transpose: NCHW2NHWC") --> NewOutput("NewOutput
NHWC"); OldOutput("OldOutput: (NCHW)") --"output_layout: "0,2,3,1""--> Transpose4("Transpose perm: 0,2,3,1") --> NewOutput("NewOutput
NHWC"); end

Dynamic shape parameter description#

ShapeBucket is a solution for dynamic shapes that optimizes dynamic shapes based on the input length range and the number of specified segments. This function defaults to false, and the corresponding option needs to be turned on to take effect. Except for specifying the corresponding field information, the other processes are no different from compiling a static model.

ONNX

Some dimensions in the shape of the model are variable names. Here we take the input of an ONNX model as an example.

tokens: int64[batch_size, tgt_seq_len] step: float32[seq_len, batch_size]

There are three variables seq_len, tgt_seq_len, and batch_size in the dimension information of shape. The first is batch_size. Although it is a variable, it is fixed to 3 in actual application. Therefore, adding batch_size = 3 to fix_var_map will fix this dimension to 3 during operation. seq_len and tgt_seq_len will actually change, so you need to configure the actual range of these two variables, which is the range_info information. segments_count is the actual number of segments, which will be divided into several equal parts according to the range, and the corresponding compilation time will also increase several times accordingly.

The following are examples of corresponding compilation parameters:

compile_options = nncase.CompileOptions()
compile_options.shape_bucket_enable = True
compile_options.shape_bucket_range_info = {"seq_len": [1, 100], "tgt_seq_len": [1, 100]}
compile_options.shape_bucket_segments_count = 2
compile_options.shape_bucket_fix_var_map = {"batch_size": 3}

TFLite

TFLite’s model is different from ONNX. The name of the dimension is not marked on the shape. Currently, only one dimension in the input is supported to be dynamic, and the name is uniformly configured as -1. The configuration method is as follows:

compile_options = nncase.CompileOptions()
compile_options.shape_bucket_enable = True
compile_options.shape_bucket_range_info = {"-1":[1, 100]}
compile_options.shape_bucket_segments_count = 2
compile_options.shape_bucket_fix_var_map = {"batch_size" : 3}

After configuring these options, the entire compilation process is consistent with the static shape.

Parameter configuration example#

Instantiate CompileOptions and configure the values of each attribute.

compile_options = nncase.CompileOptions()

compile_options.target = "cpu" #"k230"
compile_options.dump_ir = True  # if False, will not dump the compile-time result.
compile_options.dump_asm = True
compile_options.dump_dir = "dump_path"
compile_options.input_file = ""

# preprocess args
compile_options.preprocess = False
if compile_options.preprocess:
    compile_options.input_type = "uint8"  # "uint8" "float32"
    compile_options.input_shape = [1,224,320,3]
    compile_options.input_range = [0,1]
    compile_options.input_layout = "NHWC" # "NHWC" ”NCHW“
    compile_options.swapRB = False
    compile_options.mean = [0,0,0]
    compile_options.std = [1,1,1]
    compile_options.letterbox_value = 0
    compile_options.output_layout = "NHWC" # "NHWC" "NCHW"

# Dynamic shape args
compile_options.shape_bucket_enable = False
if compile_options.shape_bucket_enable:
    compile_options.shape_bucket_range_info = {"seq_len": [1, 100], "tgt_seq_len": [1, 100]}
    compile_options.shape_bucket_segments_count = 2
    compile_options.shape_bucket_fix_var_map = {"batch_size": 3}

ImportOptions#

Description

ImportOptions class, used to configure nncase import options.

definition

class ImportOptions:
    def __init__(self) -> None:
        pass

Example

Instantiate ImportOptions and configure the values of each attribute.

#import_options
import_options = nncase.ImportOptions()

PTQTensorOptions#

Description

PTQTensorOptions class, used to configure nncase PTQ options.

name	type	Is it necessary	Description
samples_count	int	no	Specifies the number of calibration sets used for quantification
calibrate_method	string	no	Specify the quantization method, optional ‘NoClip’, ‘Kld’, the default is ‘Kld’
finetune_weights_method	string	no	Specify whether to fine-tune the weights, optional ‘NoFineTuneWeights’, ‘UseSquant’, the default is ‘NoFineTuneWeights’
quant_type	string	no	Specify the data quantization type, optional ‘uint8’, ‘int8’, ‘int16’, `quant_type` and `w_quant_type` cannot be ‘int16’ at the same time
w_quant_type	string	no	Specify the weight quantization type, optional ‘uint8’, ‘int8’, ‘int16’, `quant_type` and `w_quant_type` cannot be ‘int16’ at the same time
quant_scheme	string	no	Path to import quantization parameter configuration file
quant_scheme_strict_mode	bool	no	Whether to perform quantization strictly according to quant_scheme
export_quant_scheme	bool	no	Whether to export the quantization parameter configuration file
export_weight_range_by_channel	bool	no	Whether to export the weights quantization parameter in the form of `bychannel`. It is recommended to set this parameter to `True`.

For the specific usage process of mixed quantification, see MixQuantInstructions.

Example

# ptq_options
ptq_options = nncase.PTQTensorOptions()
ptq_options.samples_count = 6
ptq_options.finetune_weights_method = "NoFineTuneWeights"
ptq_options.quant_type = "uint8"
ptq_options.w_quant_type = "uint8"
ptq_options.set_tensor_data(generate_data(input_shape, ptq_options.samples_count, args.dataset))

ptq_options.quant_scheme = ""
ptq_options.quant_scheme_strict_mode = False
ptq_options.export_quant_scheme = True
ptq_options.export_weight_range_by_channel = True

compiler.use_ptq(ptq_options)

set_tensor_data#

Description

Set tensor data and set calibration data during model conversion.

definition

    def set_tensor_data(self, data: List[List[np.ndarray]]) -> None:
        reshape_data = list(map(list, zip(*data)))
        self.cali_data = [RuntimeTensor.from_numpy(
            d) for d in itertools.chain.from_iterable(reshape_data)]

Parameters

name	type	Description
data	List[List[np.ndarray]]	Read calibration data

Return Value

None.

Example

# ptq_options
ptq_options = nncase.PTQTensorOptions()
ptq_options.samples_count = 6
ptq_options.set_tensor_data(generate_data(input_shape, ptq_options.samples_count, args.dataset))
compiler.use_ptq(ptq_options)

Compiler#

Description

Compiler class, used to compile neural network models.

definition

class Compiler:
    _target: _nncase.Target
    _session: _nncase.CompileSession
    _compiler: _nncase.Compiler
    _compile_options: _nncase.CompileOptions
    _quantize_options: _nncase.QuantizeOptions
    _module: IRModule

import_tflite#

Description

Import TFLite model.

definition

def import_tflite(self, model_content: bytes, options: ImportOptions) -> None:
    self._compile_options.input_format = "tflite"
    self._import_module(model_content)

Parameters

name	type	Description
model_content	byte[]	Read model content
import_options	ImportOptions	Import options

Return Value

None.

Example

model_content = read_model_file(model)
compiler.import_tflite(model_content, import_options)

import_onnx#

Description

Import the ONNX model.

definition

def import_onnx(self, model_content: bytes, options: ImportOptions) -> None:
    self._compile_options.input_format = "onnx"
    self._import_module(model_content)

Parameters

name	type	Description
model_content	byte[]	Read model content
import_options	ImportOptions	Import options

Return Value

None.

Example

model_content = read_model_file(model)
compiler.import_onnx(model_content, import_options)

use_ptq#

Description

Set PTQ configuration options.

K230 must use quantization by default.

definition

use_ptq(ptq_options)

Parameters

name	type	Description
ptq_options	PTQTensorOptions	PTQ configuration options

Return Value

None.

Example

compiler.use_ptq(ptq_options)

compile#

Description

Compile the neural network model.

definition

compile()

Parameters

None.

Return Value

None.

Example

compiler.compile()

gencode_tobytes#

Description

Generate kmodel byte stream.

definition

gencode_tobytes()

Parameters

None.

Return Value

bytes[]

Example

kmodel = compiler.gencode_tobytes()
with open(os.path.join(infer_dir, 'test.kmodel'), 'wb') as f:
    f.write(kmodel)

Example#

The model and python compilation script used in the following examples:

The original model file is located in the src/rtsmart/libs/nncase/examples/models directory
The python compilation script is located in the src/rtsmart/libs/nncase/examples/scripts directory

Compile TFLite model#

The mbv2_tflite.py script is as follows:

import os
import argparse
import numpy as np
from PIL import Image
import nncase

def read_model_file(model_file):
    with open(model_file, 'rb') as f:
        model_content = f.read()
    return model_content

def generate_data(shape, batch, calib_dir):
    img_paths = [os.path.join(calib_dir, p) for p in os.listdir(calib_dir)]
    data = []
    for i in range(batch):
        assert i < len(img_paths), "calibration images not enough."
        img_data = Image.open(img_paths[i]).convert('RGB')
        img_data = img_data.resize((shape[3], shape[2]), Image.BILINEAR)
        img_data = np.asarray(img_data, dtype=np.uint8)
        img_data = np.transpose(img_data, (2, 0, 1))
        data.append([img_data[np.newaxis, ...]])
    return data

def main():
    parser = argparse.ArgumentParser(prog="nncase")
    parser.add_argument("--target", type=str, help='target to run')
    parser.add_argument("--model", type=str, help='model file')
    parser.add_argument("--dataset", type=str, help='calibration_dataset')
    args = parser.parse_args()

    input_shape = [1, 3, 224, 224]
    dump_dir = 'tmp/mbv2_tflite'

    # compile_options
    compile_options = nncase.CompileOptions()
    compile_options.target = args.target
    compile_options.preprocess = True
    compile_options.swapRB = False
    compile_options.input_shape = input_shape
    compile_options.input_type = 'uint8'
    compile_options.input_range = [0, 255]
    compile_options.mean = [127.5, 127.5, 127.5]
    compile_options.std = [127.5, 127.5, 127.5]
    compile_options.input_layout = 'NCHW'
    compile_options.dump_ir = True
    compile_options.dump_asm = True
    compile_options.dump_dir = dump_dir

    # compiler
    compiler = nncase.Compiler(compile_options)

    # import
    model_content = read_model_file(args.model)
    import_options = nncase.ImportOptions()
    compiler.import_tflite(model_content, import_options)

    # ptq_options
    ptq_options = nncase.PTQTensorOptions()
    ptq_options.samples_count = 6
    ptq_options.set_tensor_data(generate_data(input_shape, ptq_options.samples_count, args.dataset))
    compiler.use_ptq(ptq_options)

    # compile
    compiler.compile()

    # kmodel
    kmodel = compiler.gencode_tobytes()
    with open(os.path.join(dump_dir, 'test.kmodel'), 'wb') as f:
        f.write(kmodel)

if __name__ == '__main__':
    main()

Execute the following command to compile the TFLite model of mobilenetv2, and the target is k230.

root@c285a41a7243:/mnt/# cd rtos_sdk/src/rtsmart/libs/nncase/examples
root@c285a41a7243:/mnt/rtos_sdk/src/rtsmart/libs/nncase/examples# python3 ./scripts/mbv2_tflite.py --target k230 --model models/mbv2.tflite --dataset calibration_dataset

Compile ONNX model#

For the ONNX model, it is recommended to use ONNX Simplifier to simplify it first, and then use nncase to compile.

The yolov5s_onnx.py script is as follows:

import os
import argparse
import numpy as np
from PIL import Image
import onnxsim
import onnx
import nncase

def parse_model_input_output(model_file):
    onnx_model = onnx.load(model_file)
    input_all = [node.name for node in onnx_model.graph.input]
    input_initializer = [node.name for node in onnx_model.graph.initializer]
    input_names = list(set(input_all) - set(input_initializer))
    input_tensors = [
        node for node in onnx_model.graph.input if node.name in input_names]

    # input
    inputs = []
    for _, e in enumerate(input_tensors):
        onnx_type = e.type.tensor_type
        input_dict = {}
        input_dict['name'] = e.name
        input_dict['dtype'] = onnx.mapping.TENSOR_TYPE_TO_NP_TYPE[onnx_type.elem_type]
        input_dict['shape'] = [(i.dim_value if i.dim_value != 0 else d) for i, d in zip(
            onnx_type.shape.dim, [1, 3, 224, 224])]
        inputs.append(input_dict)

    return onnx_model, inputs


def onnx_simplify(model_file, dump_dir):
    onnx_model, inputs = parse_model_input_output(model_file)
    onnx_model = onnx.shape_inference.infer_shapes(onnx_model)
    input_shapes = {}
    for input in inputs:
        input_shapes[input['name']] = input['shape']

    onnx_model, check = onnxsim.simplify(onnx_model, input_shapes=input_shapes)
    assert check, "Simplified ONNX model could not be validated"

    model_file = os.path.join(dump_dir, 'simplified.onnx')
    onnx.save_model(onnx_model, model_file)
    return model_file


def read_model_file(model_file):
    with open(model_file, 'rb') as f:
        model_content = f.read()
    return model_content

def generate_data_ramdom(shape, batch):
    data = []
    for i in range(batch):
        data.append([np.random.randint(0, 256, shape).astype(np.uint8)])
    return data


def generate_data(shape, batch, calib_dir):
    img_paths = [os.path.join(calib_dir, p) for p in os.listdir(calib_dir)]
    data = []
    for i in range(batch):
        assert i < len(img_paths), "calibration images not enough."
        img_data = Image.open(img_paths[i]).convert('RGB')
        img_data = img_data.resize((shape[3], shape[2]), Image.BILINEAR)
        img_data = np.asarray(img_data, dtype=np.uint8)
        img_data = np.transpose(img_data, (2, 0, 1))
        data.append([img_data[np.newaxis, ...]])
    return data

def main():
    parser = argparse.ArgumentParser(prog="nncase")
    parser.add_argument("--target", type=str, help='target to run')
    parser.add_argument("--model", type=str, help='model file')
    parser.add_argument("--dataset", type=str, help='calibration_dataset')

    args = parser.parse_args()

    input_shape = [1, 3, 320, 320]

    dump_dir = 'tmp/yolov5s_onnx'
    if not os.path.exists(dump_dir):
        os.makedirs(dump_dir)

    # onnx simplify
    model_file = onnx_simplify(args.model, dump_dir)

    # compile_options
    compile_options = nncase.CompileOptions()
    compile_options.target = args.target
    compile_options.preprocess = True
    compile_options.swapRB = False
    compile_options.input_shape = input_shape
    compile_options.input_type = 'uint8'
    compile_options.input_range = [0, 255]
    compile_options.mean = [0, 0, 0]
    compile_options.std = [255, 255, 255]
    compile_options.input_layout = 'NCHW'
    compile_options.output_layout = 'NCHW'
    compile_options.dump_ir = True
    compile_options.dump_asm = True
    compile_options.dump_dir = dump_dir

    # compiler
    compiler = nncase.Compiler(compile_options)

    # import
    model_content = read_model_file(model_file)
    import_options = nncase.ImportOptions()
    compiler.import_onnx(model_content, import_options)

    # ptq_options
    ptq_options = nncase.PTQTensorOptions()
    ptq_options.samples_count = 6
    ptq_options.set_tensor_data(generate_data(input_shape, ptq_options.samples_count, args.dataset))
    compiler.use_ptq(ptq_options)

    # compile
    compiler.compile()

    # kmodel
    kmodel = compiler.gencode_tobytes()
    with open(os.path.join(dump_dir, 'test.kmodel'), 'wb') as f:
        f.write(kmodel)

if __name__ == '__main__':
    main()

Execute the following command to compile the ONNX model, and the target is k230.

root@c285a41a7243:/mnt/# cd rtos_sdk/src/rtsmart/libs/nncase/examples
root@c285a41a7243: /mnt/rtos_sdk/src/rtsmart/libs/nncase/examples # python3 ./scripts/yolov5s_onnx.py --target k230 --model models/yolov5s.onnx --dataset calibration_dataset

nncase Model Compilation API Manual

Contents

`nncase` Model Compilation API Manual#

Overview#

API introduction#

CompileOptions#

Pre-processing process description#

Dynamic shape parameter description#

Parameter configuration example#

ImportOptions#

PTQTensorOptions#

set_tensor_data#

Compiler#

import_tflite#

import_onnx#

use_ptq#

compile#

gencode_tobytes#

Example#

Compile TFLite model#

Compile ONNX model#

nncase Model Compilation API Manual

Contents

nncase Model Compilation API Manual#

Overview#

API introduction#

CompileOptions#

Pre-processing process description#

Dynamic shape parameter description#

Parameter configuration example#

ImportOptions#

PTQTensorOptions#

set_tensor_data#

Compiler#

import_tflite#

import_onnx#

use_ptq#

compile#

gencode_tobytes#

Example#

Compile TFLite model#

Compile ONNX model#

`nncase` Model Compilation API Manual#