nncase Model Compilation API Manual#
Overview#
nncase is a neural network compiler designed for AI accelerators. The API given in this document is python API used by users to convert the trained TFLite model or ONNX model into a model format that can be accelerated by kpu, that is, kmodel. Currently, the compiled model APIs support deep learning models in formats such as TFLite/ONNX. The API provided in this document is used to compile kmodel on the local PC, and is not the code to run on k230. For learning about nncase, please refer to: nncase github repo.
API introduction#
CompileOptions#
Description
CompileOptions class, used to configure nncase compilation options. Each attribute is described as follows:
Property name |
type |
Is it necessary |
Description |
|---|---|---|---|
target |
string |
yes |
Specify the compilation target, such as ‘cpu’, ‘k230’ |
dump_ir |
bool |
no |
Specify whether to dump IR, the default is False |
dump_asm |
bool |
no |
Specifies whether to dump asm assembly files, the default is False |
dump_dir |
string |
no |
After specifying dump_ir and other switches earlier, specify the dump directory here. The default is “” |
input_file |
string |
no |
When the ONNX model exceeds 2GB, it is used to specify the parameter file path. The default is “” |
preprocess |
bool |
no |
Whether to enable pre-processing, the default is False. The following parameters only take effect when |
input_type |
string |
no |
Specify the input data type when turning on pre-processing, the default is “float”. When |
input_shape |
list[int] |
no |
Specify the shape of the input data when turning on pre-processing, the default is []. When |
input_range |
list[float] |
no |
Specify the floating point number range after dequantization of the input data when preprocessing is enabled. The default is [ ]. Must be specified when |
input_layout |
string |
no |
Specify the layout of the input data, the default is “” |
swapRB |
bool |
no |
Whether to reverse the data in the |
mean |
list[float] |
no |
The mean value of preprocessing standardized parameters, the default is [0,0,0] |
std |
list[float] |
no |
Preprocessing standardized parameter variance, default is [1,1,1] |
letterbox_value |
float |
no |
Specify the filling value of the pre-processing letterbox, the default is 0 |
output_layout |
string |
no |
Specify the layout of the output data, the default is “” |
shape_bucket_enable |
bool |
yes |
Whether to enable the ShapeBucket function, the default is False. Effective at |
shape_bucket_range_info |
Dict[str, [int, int]] |
yes |
The range of variables in each input shape dimension information, the minimum value must be greater than or equal to 1 |
shape_bucket_segments_count |
int |
yes |
The scope of the input variable is divided into several segments |
shape_bucket_fix_var_map |
Dict[str, int] |
no |
Fixed variables in shape dimension information to specific values |
Pre-processing process description#
Currently, custom pre-processing sequence is not supported. You can select the required pre-processing parameters for configuration according to the following process diagram.
(shape = input_shape
dtype = input_type)") -->a(input_layout != ' ')-.Y.->Transpose1["transpose"] -.->b("SwapRB == True")-.Y.->SwapRB["SwapRB"]-.->c("input_type != float32")-.Y.->Dequantize["Dequantize"]-.->d("input_HW != model_HW")-.Y.->LetterBox["LetterBox"] -.->e("std not empty
mean not empty")-.Y.->Normalization["Normalization"]-.->OldInput-->Model_body-->OldOutput-->f("output_layout != ' '")-.Y.->Transpose2["Transpose"]-.-> NewOutput; a--N-->b--N-->c--N-->d--N-->e--N-->OldInput; f--N-->NewOutput; subgraph origin_model OldInput; Model_body ; OldOutput; end
Parameter description:
input_rangeis the range of floating point numbers after dequantization when the input data type is fixed point.
a. The input data type is uint8, the range is [0,255], and input_range is [0,255]. The function of dequantization is only type conversion, converting the uint8 data into float32. The mean and std parameters are still specified according to the data of [0,255].
b. If the input data type is uint8, the range is [0,255], and input_range is [0,1], then inverse quantization will convert the fixed-point number into a floating-point number [0,1]. The mean and std parameters need to be specified according to the data of 0~1.
[input_type:uint8]") --input_range:0,255 -->dequantize_0["Dequantize"]--float range:0,255--> OldInput_float32 NewInput_uint81("NewInput_uint8
[input_type:uint8]") --input_range:0,1 -->dequantize_1["Dequantize"]--float range:0,1--> OldInput_float32
input_shapeis the shape of the input data, and the layout isinput_layout. It now supports string ("NHWC","NCHW") and index methods asinput_layout, and supports non-4D data processing. Wheninput_layoutis configured in string form, it indicates the layout of the input data; wheninput_layoutis configured in index form, it indicates that the input data will be transposed according to the currently configuredinput_layout, that is,input_layoutis thepermparameter ofTranspose.
The same goes for output_layout, as shown in the figure below.
NHWC"); OldOutput("OldOutput: (NCHW)") --"output_layout: "0,2,3,1""--> Transpose4("Transpose perm: 0,2,3,1") --> NewOutput("NewOutput
NHWC"); end
Dynamic shape parameter description#
ShapeBucket is a solution for dynamic shapes that optimizes dynamic shapes based on the input length range and the number of specified segments. This function defaults to false, and the corresponding option needs to be turned on to take effect. Except for specifying the corresponding field information, the other processes are no different from compiling a static model.
ONNX
Some dimensions in the shape of the model are variable names. Here we take the input of an ONNX model as an example.
tokens: int64[batch_size, tgt_seq_len] step: float32[seq_len, batch_size]
There are three variables seq_len, tgt_seq_len, and batch_size in the dimension information of shape.
The first is batch_size. Although it is a variable, it is fixed to 3 in actual application. Therefore, adding batch_size = 3 to fix_var_map will fix this dimension to 3 during operation.
seq_len and tgt_seq_len will actually change, so you need to configure the actual range of these two variables, which is the range_info information. segments_count is the actual number of segments, which will be divided into several equal parts according to the range, and the corresponding compilation time will also increase several times accordingly.
The following are examples of corresponding compilation parameters:
compile_options = nncase.CompileOptions()
compile_options.shape_bucket_enable = True
compile_options.shape_bucket_range_info = {"seq_len": [1, 100], "tgt_seq_len": [1, 100]}
compile_options.shape_bucket_segments_count = 2
compile_options.shape_bucket_fix_var_map = {"batch_size": 3}
TFLite
TFLite’s model is different from ONNX. The name of the dimension is not marked on the shape. Currently, only one dimension in the input is supported to be dynamic, and the name is uniformly configured as -1. The configuration method is as follows:
compile_options = nncase.CompileOptions()
compile_options.shape_bucket_enable = True
compile_options.shape_bucket_range_info = {"-1":[1, 100]}
compile_options.shape_bucket_segments_count = 2
compile_options.shape_bucket_fix_var_map = {"batch_size" : 3}
After configuring these options, the entire compilation process is consistent with the static shape.
Parameter configuration example#
Instantiate CompileOptions and configure the values of each attribute.
compile_options = nncase.CompileOptions()
compile_options.target = "cpu" #"k230"
compile_options.dump_ir = True # if False, will not dump the compile-time result.
compile_options.dump_asm = True
compile_options.dump_dir = "dump_path"
compile_options.input_file = ""
# preprocess args
compile_options.preprocess = False
if compile_options.preprocess:
compile_options.input_type = "uint8" # "uint8" "float32"
compile_options.input_shape = [1,224,320,3]
compile_options.input_range = [0,1]
compile_options.input_layout = "NHWC" # "NHWC" ”NCHW“
compile_options.swapRB = False
compile_options.mean = [0,0,0]
compile_options.std = [1,1,1]
compile_options.letterbox_value = 0
compile_options.output_layout = "NHWC" # "NHWC" "NCHW"
# Dynamic shape args
compile_options.shape_bucket_enable = False
if compile_options.shape_bucket_enable:
compile_options.shape_bucket_range_info = {"seq_len": [1, 100], "tgt_seq_len": [1, 100]}
compile_options.shape_bucket_segments_count = 2
compile_options.shape_bucket_fix_var_map = {"batch_size": 3}
ImportOptions#
Description
ImportOptions class, used to configure nncase import options.
definition
class ImportOptions:
def __init__(self) -> None:
pass
Example
Instantiate ImportOptions and configure the values of each attribute.
#import_options
import_options = nncase.ImportOptions()
PTQTensorOptions#
Description
PTQTensorOptions class, used to configure nncase PTQ options.
name |
type |
Is it necessary |
Description |
|---|---|---|---|
samples_count |
int |
no |
Specifies the number of calibration sets used for quantification |
calibrate_method |
string |
no |
Specify the quantization method, optional ‘NoClip’, ‘Kld’, the default is ‘Kld’ |
finetune_weights_method |
string |
no |
Specify whether to fine-tune the weights, optional ‘NoFineTuneWeights’, ‘UseSquant’, the default is ‘NoFineTuneWeights’ |
quant_type |
string |
no |
Specify the data quantization type, optional ‘uint8’, ‘int8’, ‘int16’, |
w_quant_type |
string |
no |
Specify the weight quantization type, optional ‘uint8’, ‘int8’, ‘int16’, |
quant_scheme |
string |
no |
Path to import quantization parameter configuration file |
quant_scheme_strict_mode |
bool |
no |
Whether to perform quantization strictly according to quant_scheme |
export_quant_scheme |
bool |
no |
Whether to export the quantization parameter configuration file |
export_weight_range_by_channel |
bool |
no |
Whether to export the weights quantization parameter in the form of |
For the specific usage process of mixed quantification, see MixQuantInstructions.
Example
# ptq_options
ptq_options = nncase.PTQTensorOptions()
ptq_options.samples_count = 6
ptq_options.finetune_weights_method = "NoFineTuneWeights"
ptq_options.quant_type = "uint8"
ptq_options.w_quant_type = "uint8"
ptq_options.set_tensor_data(generate_data(input_shape, ptq_options.samples_count, args.dataset))
ptq_options.quant_scheme = ""
ptq_options.quant_scheme_strict_mode = False
ptq_options.export_quant_scheme = True
ptq_options.export_weight_range_by_channel = True
compiler.use_ptq(ptq_options)
set_tensor_data#
Description
Set tensor data and set calibration data during model conversion.
definition
def set_tensor_data(self, data: List[List[np.ndarray]]) -> None:
reshape_data = list(map(list, zip(*data)))
self.cali_data = [RuntimeTensor.from_numpy(
d) for d in itertools.chain.from_iterable(reshape_data)]
Parameters
name |
type |
Description |
|---|---|---|
data |
List[List[np.ndarray]] |
Read calibration data |
Return Value
None.
Example
# ptq_options
ptq_options = nncase.PTQTensorOptions()
ptq_options.samples_count = 6
ptq_options.set_tensor_data(generate_data(input_shape, ptq_options.samples_count, args.dataset))
compiler.use_ptq(ptq_options)
Compiler#
Description
Compiler class, used to compile neural network models.
definition
class Compiler:
_target: _nncase.Target
_session: _nncase.CompileSession
_compiler: _nncase.Compiler
_compile_options: _nncase.CompileOptions
_quantize_options: _nncase.QuantizeOptions
_module: IRModule
import_tflite#
Description
Import TFLite model.
definition
def import_tflite(self, model_content: bytes, options: ImportOptions) -> None:
self._compile_options.input_format = "tflite"
self._import_module(model_content)
Parameters
name |
type |
Description |
|---|---|---|
model_content |
byte[] |
Read model content |
import_options |
ImportOptions |
Import options |
Return Value
None.
Example
model_content = read_model_file(model)
compiler.import_tflite(model_content, import_options)
import_onnx#
Description
Import the ONNX model.
definition
def import_onnx(self, model_content: bytes, options: ImportOptions) -> None:
self._compile_options.input_format = "onnx"
self._import_module(model_content)
Parameters
name |
type |
Description |
|---|---|---|
model_content |
byte[] |
Read model content |
import_options |
ImportOptions |
Import options |
Return Value
None.
Example
model_content = read_model_file(model)
compiler.import_onnx(model_content, import_options)
use_ptq#
Description
Set PTQ configuration options.
K230 must use quantization by default.
definition
use_ptq(ptq_options)
Parameters
name |
type |
Description |
|---|---|---|
ptq_options |
PTQTensorOptions |
PTQ configuration options |
Return Value
None.
Example
compiler.use_ptq(ptq_options)
compile#
Description
Compile the neural network model.
definition
compile()
Parameters
None.
Return Value
None.
Example
compiler.compile()
gencode_tobytes#
Description
Generate kmodel byte stream.
definition
gencode_tobytes()
Parameters
None.
Return Value
bytes[]
Example
kmodel = compiler.gencode_tobytes()
with open(os.path.join(infer_dir, 'test.kmodel'), 'wb') as f:
f.write(kmodel)
Example#
The model and python compilation script used in the following examples:
The original model file is located in the src/rtsmart/libs/nncase/examples/models directory
The python compilation script is located in the src/rtsmart/libs/nncase/examples/scripts directory
Compile TFLite model#
The mbv2_tflite.py script is as follows:
import os
import argparse
import numpy as np
from PIL import Image
import nncase
def read_model_file(model_file):
with open(model_file, 'rb') as f:
model_content = f.read()
return model_content
def generate_data(shape, batch, calib_dir):
img_paths = [os.path.join(calib_dir, p) for p in os.listdir(calib_dir)]
data = []
for i in range(batch):
assert i < len(img_paths), "calibration images not enough."
img_data = Image.open(img_paths[i]).convert('RGB')
img_data = img_data.resize((shape[3], shape[2]), Image.BILINEAR)
img_data = np.asarray(img_data, dtype=np.uint8)
img_data = np.transpose(img_data, (2, 0, 1))
data.append([img_data[np.newaxis, ...]])
return data
def main():
parser = argparse.ArgumentParser(prog="nncase")
parser.add_argument("--target", type=str, help='target to run')
parser.add_argument("--model", type=str, help='model file')
parser.add_argument("--dataset", type=str, help='calibration_dataset')
args = parser.parse_args()
input_shape = [1, 3, 224, 224]
dump_dir = 'tmp/mbv2_tflite'
# compile_options
compile_options = nncase.CompileOptions()
compile_options.target = args.target
compile_options.preprocess = True
compile_options.swapRB = False
compile_options.input_shape = input_shape
compile_options.input_type = 'uint8'
compile_options.input_range = [0, 255]
compile_options.mean = [127.5, 127.5, 127.5]
compile_options.std = [127.5, 127.5, 127.5]
compile_options.input_layout = 'NCHW'
compile_options.dump_ir = True
compile_options.dump_asm = True
compile_options.dump_dir = dump_dir
# compiler
compiler = nncase.Compiler(compile_options)
# import
model_content = read_model_file(args.model)
import_options = nncase.ImportOptions()
compiler.import_tflite(model_content, import_options)
# ptq_options
ptq_options = nncase.PTQTensorOptions()
ptq_options.samples_count = 6
ptq_options.set_tensor_data(generate_data(input_shape, ptq_options.samples_count, args.dataset))
compiler.use_ptq(ptq_options)
# compile
compiler.compile()
# kmodel
kmodel = compiler.gencode_tobytes()
with open(os.path.join(dump_dir, 'test.kmodel'), 'wb') as f:
f.write(kmodel)
if __name__ == '__main__':
main()
Execute the following command to compile the TFLite model of mobilenetv2, and the target is k230.
root@c285a41a7243:/mnt/# cd rtos_sdk/src/rtsmart/libs/nncase/examples
root@c285a41a7243:/mnt/rtos_sdk/src/rtsmart/libs/nncase/examples# python3 ./scripts/mbv2_tflite.py --target k230 --model models/mbv2.tflite --dataset calibration_dataset
Compile ONNX model#
For the ONNX model, it is recommended to use ONNX Simplifier to simplify it first, and then use nncase to compile.
The yolov5s_onnx.py script is as follows:
import os
import argparse
import numpy as np
from PIL import Image
import onnxsim
import onnx
import nncase
def parse_model_input_output(model_file):
onnx_model = onnx.load(model_file)
input_all = [node.name for node in onnx_model.graph.input]
input_initializer = [node.name for node in onnx_model.graph.initializer]
input_names = list(set(input_all) - set(input_initializer))
input_tensors = [
node for node in onnx_model.graph.input if node.name in input_names]
# input
inputs = []
for _, e in enumerate(input_tensors):
onnx_type = e.type.tensor_type
input_dict = {}
input_dict['name'] = e.name
input_dict['dtype'] = onnx.mapping.TENSOR_TYPE_TO_NP_TYPE[onnx_type.elem_type]
input_dict['shape'] = [(i.dim_value if i.dim_value != 0 else d) for i, d in zip(
onnx_type.shape.dim, [1, 3, 224, 224])]
inputs.append(input_dict)
return onnx_model, inputs
def onnx_simplify(model_file, dump_dir):
onnx_model, inputs = parse_model_input_output(model_file)
onnx_model = onnx.shape_inference.infer_shapes(onnx_model)
input_shapes = {}
for input in inputs:
input_shapes[input['name']] = input['shape']
onnx_model, check = onnxsim.simplify(onnx_model, input_shapes=input_shapes)
assert check, "Simplified ONNX model could not be validated"
model_file = os.path.join(dump_dir, 'simplified.onnx')
onnx.save_model(onnx_model, model_file)
return model_file
def read_model_file(model_file):
with open(model_file, 'rb') as f:
model_content = f.read()
return model_content
def generate_data_ramdom(shape, batch):
data = []
for i in range(batch):
data.append([np.random.randint(0, 256, shape).astype(np.uint8)])
return data
def generate_data(shape, batch, calib_dir):
img_paths = [os.path.join(calib_dir, p) for p in os.listdir(calib_dir)]
data = []
for i in range(batch):
assert i < len(img_paths), "calibration images not enough."
img_data = Image.open(img_paths[i]).convert('RGB')
img_data = img_data.resize((shape[3], shape[2]), Image.BILINEAR)
img_data = np.asarray(img_data, dtype=np.uint8)
img_data = np.transpose(img_data, (2, 0, 1))
data.append([img_data[np.newaxis, ...]])
return data
def main():
parser = argparse.ArgumentParser(prog="nncase")
parser.add_argument("--target", type=str, help='target to run')
parser.add_argument("--model", type=str, help='model file')
parser.add_argument("--dataset", type=str, help='calibration_dataset')
args = parser.parse_args()
input_shape = [1, 3, 320, 320]
dump_dir = 'tmp/yolov5s_onnx'
if not os.path.exists(dump_dir):
os.makedirs(dump_dir)
# onnx simplify
model_file = onnx_simplify(args.model, dump_dir)
# compile_options
compile_options = nncase.CompileOptions()
compile_options.target = args.target
compile_options.preprocess = True
compile_options.swapRB = False
compile_options.input_shape = input_shape
compile_options.input_type = 'uint8'
compile_options.input_range = [0, 255]
compile_options.mean = [0, 0, 0]
compile_options.std = [255, 255, 255]
compile_options.input_layout = 'NCHW'
compile_options.output_layout = 'NCHW'
compile_options.dump_ir = True
compile_options.dump_asm = True
compile_options.dump_dir = dump_dir
# compiler
compiler = nncase.Compiler(compile_options)
# import
model_content = read_model_file(model_file)
import_options = nncase.ImportOptions()
compiler.import_onnx(model_content, import_options)
# ptq_options
ptq_options = nncase.PTQTensorOptions()
ptq_options.samples_count = 6
ptq_options.set_tensor_data(generate_data(input_shape, ptq_options.samples_count, args.dataset))
compiler.use_ptq(ptq_options)
# compile
compiler.compile()
# kmodel
kmodel = compiler.gencode_tobytes()
with open(os.path.join(dump_dir, 'test.kmodel'), 'wb') as f:
f.write(kmodel)
if __name__ == '__main__':
main()
Execute the following command to compile the ONNX model, and the target is k230.
root@c285a41a7243:/mnt/# cd rtos_sdk/src/rtsmart/libs/nncase/examples
root@c285a41a7243: /mnt/rtos_sdk/src/rtsmart/libs/nncase/examples # python3 ./scripts/yolov5s_onnx.py --target k230 --model models/yolov5s.onnx --dataset calibration_dataset
