NNCASE Guide#
Overview#
What nncase Is#
nncase is a neural-network compiler designed for AI accelerators. It currently supports targets such as CPU, K210, K510, and K230.
nncase provides:
support for multi-input and multi-output networks
support for multi-branch structures
static memory allocation without heap dependency
operator fusion and optimization
float and
uint8/int8quantized inferencepost-training quantization using floating-point models and calibration datasets
flat models with zero-copy loading support
Supported neural-network model formats:
TFLiteONNX
nncase Architecture#
The nncase software stack contains two parts:
compilerruntime
Compiler is used on the PC side to compile a neural-network model and generate the final kmodel. Its main modules include:
Importer: imports models from other neural-network frameworksIR: the intermediate representation, including neutral IR and target IREvaluator: provides interpreted execution for IR and is often used for constant folding and PTQ calibrationTransform: performs IR transformation and graph-traversal optimizationQuantize: performs post-training quantization, inserts quantize/dequantize nodes, collects tensor ranges, and removes unnecessary quantize/dequantize operationsTiling: splits large computation to fit limited NPU memory and optimize latency/bandwidth tradeoffsPartition: partitions the graph by module type so each subgraph can map to a different runtime module and deviceSchedule: generates execution order and allocates buffers according to data dependenciesCodegen: generates runtime modules for each partitioned subgraph
Runtime is integrated into the user application and provides:
kmodelloadinginput-data setup
KPU execution
output retrieval
Development Environment#
Operating System#
Supported host systems include:
Ubuntu 18.04
Ubuntu 20.04
Windows 10
Windows 11
Software Environment#
No. |
Software |
Version |
|---|---|---|
1 |
Python |
|
2 |
pip |
|
3 |
numpy |
|
4 |
onnx |
|
5 |
onnx-simplifier |
|
6 |
onnxoptimizer |
|
7 |
onnxruntime |
|
8 |
dotnet-runtime |
|
Operator Support#
TFLite Operators#
Operator |
Is Supported |
|---|---|
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
ONNX Operators#
Operator |
Is Supported |
|---|---|
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
|
Yes |
API Documentation#
The nncase stack contains both compiler and runtime, used for model conversion and KPU-side inference respectively. Python and C++ APIs are provided for both parts. Refer to:
Usage Steps#
Environment Setup#
Linux
First install dotnet-sdk-7.0 and set the dotnet environment variable. Do not install dotnet inside an Anaconda virtual environment:
sudo apt-get update
sudo apt-get install dotnet-sdk-7.0
export DOTNET_ROOT=/usr/share/dotnet
Then install nncase and nncase-kpu:
pip install nncase nncase-kpu
Windows
First install dotnet-sdk-7.0. For the installation steps, see the Microsoft documentation:
Then install nncase, download the matching nncase_kpu-2.x.x-py2.py3-none-win_amd64.whl package from Release, and install it locally:
pip install nncase
pip install nncase_kpu-2.x.x-py2.py3-none-win_amd64.whl
Docker
If you do not have an Ubuntu environment, use the nncase Docker image (Ubuntu 20.04 + Python 3.8 + dotnet-7.0):
cd /path/to/nncase_sdk
docker pull ghcr.io/kendryte/k230_sdk
docker run -it --rm -v `pwd`:/mnt -w /mnt ghcr.io/kendryte/k230_sdk /bin/bash -c "/bin/bash"
Check Version
root@469e6a4a9e71:/mnt# python3
Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import _nncase
>>> print(_nncase.__version__)
2.9.0
Model Conversion#
The nncase user guide is available here:
When converting tflite/onnx models into kmodel, the key is to configure the options according to your own deployment needs. The main configuration objects are:
CompileOptionsPTQTensorOptionsImportOptions
CompileOptions#
CompileOptions is used to configure nncase compilation behavior.
Field |
Type |
Required |
Description |
|---|---|---|---|
|
|
Yes |
compilation target, such as |
|
|
No |
whether to dump IR, default |
|
|
No |
whether to dump assembly output, default |
|
|
No |
dump directory used together with |
|
|
No |
parameter-file path for ONNX models larger than |
|
|
No |
whether to enable built-in preprocessing, default |
|
|
No |
input data type when preprocessing is enabled, default |
|
|
No |
input shape when preprocessing is enabled, default |
|
|
No |
floating-point range after dequantization when preprocessing is enabled; required when |
|
|
No |
input layout, default |
|
|
No |
whether to swap data on the channel dimension, default |
|
|
No |
preprocessing normalization mean, default |
|
|
No |
preprocessing normalization std, default |
|
|
No |
fill value used by letterbox preprocessing, default |
|
|
No |
output layout, default |
|
|
Yes |
whether to enable ShapeBucket, default |
|
|
Yes |
range of each variable shape dimension; the minimum value must be greater than or equal to |
|
|
Yes |
number of segments used to split the input variable range |
|
|
No |
fixes a variable shape dimension to a specific value |
Preprocessing Notes#
For preprocessing details, refer to:
nncase compile API preprocessing
Moving part of the preprocessing into the model can improve preprocessing efficiency during board-side inference. Supported preprocessing includes:
swapRB(RGB -> BGRorBGR -> RGB)Transpose(NHWC -> NCHWorNCHW -> NHWC)NormalizationDequantize
For example, if an ONNX model expects RGB input while OpenCV reads images as BGR, the normal runtime preprocessing path would first convert BGR to RGB. During kmodel conversion, you can set swapRB=True so the converted kmodel already contains the RB-swap preprocessing step. The board-side preprocessing code can then skip that step.
PTQTensorOptions#
PTQTensorOptions configures nncase PTQ behavior:
Field |
Type |
Required |
Description |
|---|---|---|---|
|
|
No |
number of calibration samples used for quantization |
|
|
No |
quantization method, such as |
|
|
No |
weight fine-tuning method, such as |
|
|
No |
data quantization type; can be |
|
|
No |
weight quantization type; can be |
|
|
No |
path to the imported quantization-scheme file |
|
|
No |
whether to quantize strictly according to |
|
|
No |
whether to export the quantization-scheme file |
|
|
No |
whether to export weight quantization parameters in per-channel form; setting this to |
For mixed quantization, see:
For PTQ details, refer to:
nncase compile API PTQ options
If the converted kmodel quality is not good enough, you can adjust quant_type and w_quant_type, but these two parameters must not both be int16.
Calibration Dataset Setup#
Name |
Type |
Description |
|---|---|---|
|
|
loaded calibration data |
Calibration data is configured through set_tensor_data, and the parameter type is List[List[np.ndarray]].
For example:
if the model has one input and uses 10 calibration samples, the data shape may be
[10, 1, 3, 224, 224]if the model has two inputs and uses 10 calibration samples, the data shape may be
[[10, 1, 3, 224, 224], [10, 1, 3, 320, 320]]
ImportOptions#
ImportOptions is used to configure model import into the compiler. It supports both tflite and onnx.
Example:
# Read and import a TFLite model
model_content = read_model_file(model)
compiler.import_tflite(model_content, import_options)
# Read and import an ONNX model
model_content = read_model_file(model)
compiler.import_onnx(model_content, import_options)
YOLOv8 ONNX to kmodel Example#
import os
import argparse
import numpy as np
from PIL import Image
import onnxsim
import onnx
import nncase
import shutil
import math
def parse_model_input_output(model_file,input_shape):
onnx_model = onnx.load(model_file)
input_all = [node.name for node in onnx_model.graph.input]
input_initializer = [node.name for node in onnx_model.graph.initializer]
input_names = list(set(input_all) - set(input_initializer))
input_tensors = [
node for node in onnx_model.graph.input if node.name in input_names]
# input
inputs = []
for _, e in enumerate(input_tensors):
onnx_type = e.type.tensor_type
input_dict = {}
input_dict['name'] = e.name
input_dict['dtype'] = onnx.mapping.TENSOR_TYPE_TO_NP_TYPE[onnx_type.elem_type]
input_dict['shape'] = [(i.dim_value if i.dim_value != 0 else d) for i, d in zip(
onnx_type.shape.dim, input_shape)]
inputs.append(input_dict)
return onnx_model, inputs
def onnx_simplify(model_file, dump_dir,input_shape):
onnx_model, inputs = parse_model_input_output(model_file,input_shape)
onnx_model = onnx.shape_inference.infer_shapes(onnx_model)
input_shapes = {}
for input in inputs:
input_shapes[input['name']] = input['shape']
onnx_model, check = onnxsim.simplify(onnx_model, input_shapes=input_shapes)
assert check, "Simplified ONNX model could not be validated"
model_file = os.path.join(dump_dir, 'simplified.onnx')
onnx.save_model(onnx_model, model_file)
return model_file
def read_model_file(model_file):
with open(model_file, 'rb') as f:
model_content = f.read()
return model_content
def generate_data(shape, batch, calib_dir):
img_paths = [os.path.join(calib_dir, p) for p in os.listdir(calib_dir)]
data = []
for i in range(batch):
assert i < len(img_paths), "calibration images not enough."
img_data = Image.open(img_paths[i]).convert('RGB')
img_data = img_data.resize((shape[3], shape[2]), Image.BILINEAR)
img_data = np.asarray(img_data, dtype=np.uint8)
img_data = np.transpose(img_data, (2, 0, 1))
data.append([img_data[np.newaxis, ...]])
return np.array(data)
def main():
parser = argparse.ArgumentParser(prog="nncase")
parser.add_argument("--target", default="k230",type=str, help='target to run,k230/cpu')
parser.add_argument("--model",type=str, help='model file')
parser.add_argument("--dataset_path", type=str, help='calibration_dataset')
parser.add_argument("--input_width", type=int, default=320, help='model input_width')
parser.add_argument("--input_height", type=int, default=320, help='model input_height')
parser.add_argument("--ptq_option", type=int, default=0, help='ptq_option:0,1,2,3,4,5')
args = parser.parse_args()
# Align width and height to multiples of 32
input_width = int(math.ceil(args.input_width / 32.0)) * 32
input_height = int(math.ceil(args.input_height / 32.0)) * 32
# Model input shape; dimensions must match input_layout
input_shape=[1,3,input_height,input_width]
dump_dir = 'tmp'
if not os.path.exists(dump_dir):
os.makedirs(dump_dir)
# simplify ONNX
model_file = onnx_simplify(args.model, dump_dir,input_shape)
# Configure CompileOptions
compile_options = nncase.CompileOptions()
compile_options.target = args.target
# Whether to use model-side preprocessing
compile_options.preprocess = True
# The ONNX model expects RGB, and K230 camera data is also RGB,
# so RB swap is not required
compile_options.swapRB = False
# Input image shape
compile_options.input_shape = input_shape
# Input type: uint8 or float32
compile_options.input_type = 'uint8'
# Dequantized input range when input_type is uint8
compile_options.input_range = [0, 1]
# mean/std values for preprocessing; these come from the YOLOv8 source
compile_options.mean = [0, 0, 0]
compile_options.std = [1, 1, 1]
# Set input layout; ONNX normally uses NCHW
compile_options.input_layout = "NCHW"
# Create Compiler instance
compiler = nncase.Compiler(compile_options)
# Import ONNX model
model_content = read_model_file(model_file)
import_options = nncase.ImportOptions()
compiler.import_onnx(model_content, import_options)
# Configure quantization method
ptq_options = nncase.PTQTensorOptions()
ptq_options.samples_count = 10
if args.ptq_option == 0:
ptq_options.calibrate_method = 'NoClip'
ptq_options.quant_type = 'uint8'
ptq_options.w_quant_type = 'uint8'
elif args.ptq_option == 1:
ptq_options.calibrate_method = 'NoClip'
ptq_options.quant_type = 'uint8'
ptq_options.w_quant_type = 'int16'
elif args.ptq_option == 2:
ptq_options.calibrate_method = 'NoClip'
ptq_options.quant_type = 'int16'
ptq_options.w_quant_type = 'uint8'
elif args.ptq_option == 3:
ptq_options.calibrate_method = 'Kld'
ptq_options.quant_type = 'uint8'
ptq_options.w_quant_type = 'uint8'
elif args.ptq_option == 4:
ptq_options.calibrate_method = 'Kld'
ptq_options.quant_type = 'uint8'
ptq_options.w_quant_type = 'int16'
elif args.ptq_option == 5:
ptq_options.calibrate_method = 'Kld'
ptq_options.quant_type = 'int16'
ptq_options.w_quant_type = 'uint8'
else:
pass
# Set calibration data
ptq_options.set_tensor_data(generate_data(input_shape, ptq_options.samples_count, args.dataset_path))
compiler.use_ptq(ptq_options)
# Start compilation
compiler.compile()
# Write kmodel file
kmodel = compiler.gencode_tobytes()
base,ext=os.path.splitext(args.model)
kmodel_name=base+".kmodel"
with open(kmodel_name, 'wb') as f:
f.write(kmodel)
if __name__ == '__main__':
main()
After model conversion succeeds, deploying it on the board requires C++ code that calls the nncase_runtime API.
