NNCASE Guide#

Overview#

What nncase Is#

nncase is a neural-network compiler designed for AI accelerators. It currently supports targets such as CPU, K210, K510, and K230.

nncase provides:

support for multi-input and multi-output networks
support for multi-branch structures
static memory allocation without heap dependency
operator fusion and optimization
float and uint8/int8 quantized inference
post-training quantization using floating-point models and calibration datasets
flat models with zero-copy loading support

Supported neural-network model formats:

TFLite
ONNX

nncase Architecture#

nncase architecture

The nncase software stack contains two parts:

compiler
runtime

Compiler is used on the PC side to compile a neural-network model and generate the final kmodel. Its main modules include:

Importer: imports models from other neural-network frameworks
IR: the intermediate representation, including neutral IR and target IR
Evaluator: provides interpreted execution for IR and is often used for constant folding and PTQ calibration
Transform: performs IR transformation and graph-traversal optimization
Quantize: performs post-training quantization, inserts quantize/dequantize nodes, collects tensor ranges, and removes unnecessary quantize/dequantize operations
Tiling: splits large computation to fit limited NPU memory and optimize latency/bandwidth tradeoffs
Partition: partitions the graph by module type so each subgraph can map to a different runtime module and device
Schedule: generates execution order and allocates buffers according to data dependencies
Codegen: generates runtime modules for each partitioned subgraph

Runtime is integrated into the user application and provides:

kmodel loading
input-data setup
KPU execution
output retrieval

Development Environment#

Operating System#

Supported host systems include:

Ubuntu 18.04
Ubuntu 20.04
Windows 10
Windows 11

Software Environment#

No.	Software	Version
1	Python	`3.6/3.7/3.8/3.9/3.10`
2	pip	`>=20.3`
3	numpy	`1.19.5`
4	onnx	`1.9.0`
5	onnx-simplifier	`0.3.6`
6	onnxoptimizer	`0.2.6`
7	onnxruntime	`1.8.0`
8	dotnet-runtime	`7.0`

Operator Support#

TFLite Operators#

Operator	Is Supported
`ABS`	Yes
`ADD`	Yes
`ARG_MAX`	Yes
`ARG_MIN`	Yes
`AVERAGE_POOL_2D`	Yes
`BATCH_MATMUL`	Yes
`CAST`	Yes
`CEIL`	Yes
`CONCATENATION`	Yes
`CONV_2D`	Yes
`COS`	Yes
`CUSTOM`	Yes
`DEPTHWISE_CONV_2D`	Yes
`DIV`	Yes
`EQUAL`	Yes
`EXP`	Yes
`EXPAND_DIMS`	Yes
`FLOOR`	Yes
`FLOOR_DIV`	Yes
`FLOOR_MOD`	Yes
`FULLY_CONNECTED`	Yes
`GREATER`	Yes
`GREATER_EQUAL`	Yes
`L2_NORMALIZATION`	Yes
`LEAKY_RELU`	Yes
`LESS`	Yes
`LESS_EQUAL`	Yes
`LOG`	Yes
`LOGISTIC`	Yes
`MAX_POOL_2D`	Yes
`MAXIMUM`	Yes
`MEAN`	Yes
`MINIMUM`	Yes
`MUL`	Yes
`NEG`	Yes
`NOT_EQUAL`	Yes
`PAD`	Yes
`PADV2`	Yes
`MIRROR_PAD`	Yes
`PACK`	Yes
`POW`	Yes
`REDUCE_MAX`	Yes
`REDUCE_MIN`	Yes
`REDUCE_PROD`	Yes
`RELU`	Yes
`PRELU`	Yes
`RELU6`	Yes
`RESHAPE`	Yes
`RESIZE_BILINEAR`	Yes
`RESIZE_NEAREST_NEIGHBOR`	Yes
`ROUND`	Yes
`RSQRT`	Yes
`SHAPE`	Yes
`SIN`	Yes
`SLICE`	Yes
`SOFTMAX`	Yes
`SPACE_TO_BATCH_ND`	Yes
`SQUEEZE`	Yes
`BATCH_TO_SPACE_ND`	Yes
`STRIDED_SLICE`	Yes
`SQRT`	Yes
`SQUARE`	Yes
`SUB`	Yes
`SUM`	Yes
`TANH`	Yes
`TILE`	Yes
`TRANSPOSE`	Yes
`TRANSPOSE_CONV`	Yes
`QUANTIZE`	Yes
`FAKE_QUANT`	Yes
`DEQUANTIZE`	Yes
`GATHER`	Yes
`GATHER_ND`	Yes
`ONE_HOT`	Yes
`SQUARED_DIFFERENCE`	Yes
`LOG_SOFTMAX`	Yes
`SPLIT`	Yes
`HARD_SWISH`	Yes

ONNX Operators#

Operator	Is Supported
`Abs`	Yes
`Acos`	Yes
`Acosh`	Yes
`And`	Yes
`ArgMax`	Yes
`ArgMin`	Yes
`Asin`	Yes
`Asinh`	Yes
`Add`	Yes
`AveragePool`	Yes
`BatchNormalization`	Yes
`Cast`	Yes
`Ceil`	Yes
`Celu`	Yes
`Clip`	Yes
`Compress`	Yes
`Concat`	Yes
`Constant`	Yes
`ConstantOfShape`	Yes
`Conv`	Yes
`ConvTranspose`	Yes
`Cos`	Yes
`Cosh`	Yes
`CumSum`	Yes
`DepthToSpace`	Yes
`DequantizeLinear`	Yes
`Div`	Yes
`Dropout`	Yes
`Elu`	Yes
`Exp`	Yes
`Expand`	Yes
`Equal`	Yes
`Erf`	Yes
`Flatten`	Yes
`Floor`	Yes
`Gather`	Yes
`GatherElements`	Yes
`GatherND`	Yes
`Gemm`	Yes
`GlobalAveragePool`	Yes
`GlobalMaxPool`	Yes
`Greater`	Yes
`GreaterOrEqual`	Yes
`GRU`	Yes
`Hardmax`	Yes
`HardSigmoid`	Yes
`HardSwish`	Yes
`Identity`	Yes
`InstanceNormalization`	Yes
`LayerNormalization`	Yes
`LpNormalization`	Yes
`LeakyRelu`	Yes
`Less`	Yes
`LessOrEqual`	Yes
`Log`	Yes
`LogSoftmax`	Yes
`LRN`	Yes
`LSTM`	Yes
`MatMul`	Yes
`MaxPool`	Yes
`Max`	Yes
`Min`	Yes
`Mul`	Yes
`Neg`	Yes
`Not`	Yes
`OneHot`	Yes
`Pad`	Yes
`Pow`	Yes
`PRelu`	Yes
`QuantizeLinear`	Yes
`RandomNormal`	Yes
`RandomNormalLike`	Yes
`RandomUniform`	Yes
`RandomUniformLike`	Yes
`ReduceL1`	Yes
`ReduceL2`	Yes
`ReduceLogSum`	Yes
`ReduceLogSumExp`	Yes
`ReduceMax`	Yes
`ReduceMean`	Yes
`ReduceMin`	Yes
`ReduceProd`	Yes
`ReduceSum`	Yes
`ReduceSumSquare`	Yes
`Relu`	Yes
`Reshape`	Yes
`Resize`	Yes
`ReverseSequence`	Yes
`RoiAlign`	Yes
`Round`	Yes
`Rsqrt`	Yes
`Selu`	Yes
`Shape`	Yes
`Sign`	Yes
`Sin`	Yes
`Sinh`	Yes
`Sigmoid`	Yes
`Size`	Yes
`Slice`	Yes
`Softmax`	Yes
`Softplus`	Yes
`Softsign`	Yes
`SpaceToDepth`	Yes
`Split`	Yes
`Sqrt`	Yes
`Squeeze`	Yes
`Sub`	Yes
`Sum`	Yes
`Tanh`	Yes
`Tile`	Yes
`TopK`	Yes
`Transpose`	Yes
`Trilu`	Yes
`ThresholdedRelu`	Yes
`Upsample`	Yes
`Unsqueeze`	Yes
`Where`	Yes

API Documentation#

The nncase stack contains both compiler and runtime, used for model conversion and KPU-side inference respectively. Python and C++ APIs are provided for both parts. Refer to:

nncase API documentation

Usage Steps#

Environment Setup#

Linux

First install dotnet-sdk-7.0 and set the dotnet environment variable. Do not install dotnet inside an Anaconda virtual environment:

sudo apt-get update
sudo apt-get install dotnet-sdk-7.0
export DOTNET_ROOT=/usr/share/dotnet

Then install nncase and nncase-kpu:

pip install nncase nncase-kpu

Windows

First install dotnet-sdk-7.0. For the installation steps, see the Microsoft documentation:

Install .NET on Windows

Then install nncase, download the matching nncase_kpu-2.x.x-py2.py3-none-win_amd64.whl package from Release, and install it locally:

pip install nncase
pip install nncase_kpu-2.x.x-py2.py3-none-win_amd64.whl

Docker

If you do not have an Ubuntu environment, use the nncase Docker image (Ubuntu 20.04 + Python 3.8 + dotnet-7.0):

cd /path/to/nncase_sdk
docker pull ghcr.io/kendryte/k230_sdk
docker run -it --rm -v `pwd`:/mnt -w /mnt ghcr.io/kendryte/k230_sdk /bin/bash -c "/bin/bash"

Check Version

root@469e6a4a9e71:/mnt# python3
Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import _nncase
>>> print(_nncase.__version__)
2.9.0

Model Conversion#

The nncase user guide is available here:

github: user_guide

When converting tflite/onnx models into kmodel, the key is to configure the options according to your own deployment needs. The main configuration objects are:

CompileOptions
PTQTensorOptions
ImportOptions

CompileOptions#

CompileOptions is used to configure nncase compilation behavior.

Field	Type	Required	Description
`target`	`string`	Yes	compilation target, such as `cpu` or `k230`
`dump_ir`	`bool`	No	whether to dump IR, default `False`
`dump_asm`	`bool`	No	whether to dump assembly output, default `False`
`dump_dir`	`string`	No	dump directory used together with `dump_ir` or related options, default `""`
`input_file`	`string`	No	parameter-file path for ONNX models larger than `2GB`, default `""`
`preprocess`	`bool`	No	whether to enable built-in preprocessing, default `False`; the following preprocessing-related fields only take effect when `preprocess=True`
`input_type`	`string`	No	input data type when preprocessing is enabled, default `"float"`; when `preprocess=True`, it must be `"uint8"` or `"float32"`
`input_shape`	`list[int]`	No	input shape when preprocessing is enabled, default `[]`; required when `preprocess=True`
`input_range`	`list[float]`	No	floating-point range after dequantization when preprocessing is enabled; required when `preprocess=True` and `input_type="uint8"`
`input_layout`	`string`	No	input layout, default `""`
`swapRB`	`bool`	No	whether to swap data on the channel dimension, default `False`
`mean`	`list[float]`	No	preprocessing normalization mean, default `[0,0,0]`
`std`	`list[float]`	No	preprocessing normalization std, default `[1,1,1]`
`letterbox_value`	`float`	No	fill value used by letterbox preprocessing, default `0`
`output_layout`	`string`	No	output layout, default `""`
`shape_bucket_enable`	`bool`	Yes	whether to enable ShapeBucket, default `False`; effective together with the related variable-shape flow
`shape_bucket_range_info`	`Dict[str, [int, int]]`	Yes	range of each variable shape dimension; the minimum value must be greater than or equal to `1`
`shape_bucket_segments_count`	`int`	Yes	number of segments used to split the input variable range
`shape_bucket_fix_var_map`	`Dict[str, int]`	No	fixes a variable shape dimension to a specific value

Preprocessing Notes#

For preprocessing details, refer to:

nncase compile API preprocessing

Moving part of the preprocessing into the model can improve preprocessing efficiency during board-side inference. Supported preprocessing includes:

swapRB (RGB -> BGR or BGR -> RGB)
Transpose (NHWC -> NCHW or NCHW -> NHWC)
Normalization
Dequantize

For example, if an ONNX model expects RGB input while OpenCV reads images as BGR, the normal runtime preprocessing path would first convert BGR to RGB. During kmodel conversion, you can set swapRB=True so the converted kmodel already contains the RB-swap preprocessing step. The board-side preprocessing code can then skip that step.

PTQTensorOptions#

PTQTensorOptions configures nncase PTQ behavior:

Field	Type	Required	Description
`samples_count`	`int`	No	number of calibration samples used for quantization
`calibrate_method`	`string`	No	quantization method, such as `NoClip` or `Kld`, default `Kld`
`finetune_weights_method`	`string`	No	weight fine-tuning method, such as `NoFineTuneWeights` or `UseSquant`, default `NoFineTuneWeights`
`quant_type`	`string`	No	data quantization type; can be `uint8`, `int8`, or `int16`; `quant_type` and `w_quant_type` must not both be `int16`
`w_quant_type`	`string`	No	weight quantization type; can be `uint8`, `int8`, or `int16`; `quant_type` and `w_quant_type` must not both be `int16`
`quant_scheme`	`string`	No	path to the imported quantization-scheme file
`quant_scheme_strict_mode`	`bool`	No	whether to quantize strictly according to `quant_scheme`
`export_quant_scheme`	`bool`	No	whether to export the quantization-scheme file
`export_weight_range_by_channel`	`bool`	No	whether to export weight quantization parameters in per-channel form; setting this to `True` is recommended

For mixed quantization, see:

MixQuant Guide

For PTQ details, refer to:

nncase compile API PTQ options

If the converted kmodel quality is not good enough, you can adjust quant_type and w_quant_type, but these two parameters must not both be int16.

Calibration Dataset Setup#

Name	Type	Description
`data`	`List[List[np.ndarray]]`	loaded calibration data

Calibration data is configured through set_tensor_data, and the parameter type is List[List[np.ndarray]].

For example:

if the model has one input and uses 10 calibration samples, the data shape may be [10, 1, 3, 224, 224]
if the model has two inputs and uses 10 calibration samples, the data shape may be [[10, 1, 3, 224, 224], [10, 1, 3, 320, 320]]

ImportOptions#

ImportOptions is used to configure model import into the compiler. It supports both tflite and onnx.

Example:

# Read and import a TFLite model
model_content = read_model_file(model)
compiler.import_tflite(model_content, import_options)

# Read and import an ONNX model
model_content = read_model_file(model)
compiler.import_onnx(model_content, import_options)

YOLOv8 ONNX to `kmodel` Example#

import os
import argparse
import numpy as np
from PIL import Image
import onnxsim
import onnx
import nncase
import shutil
import math

def parse_model_input_output(model_file,input_shape):
    onnx_model = onnx.load(model_file)
    input_all = [node.name for node in onnx_model.graph.input]
    input_initializer = [node.name for node in onnx_model.graph.initializer]
    input_names = list(set(input_all) - set(input_initializer))
    input_tensors = [
        node for node in onnx_model.graph.input if node.name in input_names]

    # input
    inputs = []
    for _, e in enumerate(input_tensors):
        onnx_type = e.type.tensor_type
        input_dict = {}
        input_dict['name'] = e.name
        input_dict['dtype'] = onnx.mapping.TENSOR_TYPE_TO_NP_TYPE[onnx_type.elem_type]
        input_dict['shape'] = [(i.dim_value if i.dim_value != 0 else d) for i, d in zip(
            onnx_type.shape.dim, input_shape)]
        inputs.append(input_dict)

    return onnx_model, inputs


def onnx_simplify(model_file, dump_dir,input_shape):
    onnx_model, inputs = parse_model_input_output(model_file,input_shape)
    onnx_model = onnx.shape_inference.infer_shapes(onnx_model)
    input_shapes = {}
    for input in inputs:
        input_shapes[input['name']] = input['shape']

    onnx_model, check = onnxsim.simplify(onnx_model, input_shapes=input_shapes)
    assert check, "Simplified ONNX model could not be validated"

    model_file = os.path.join(dump_dir, 'simplified.onnx')
    onnx.save_model(onnx_model, model_file)
    return model_file


def read_model_file(model_file):
    with open(model_file, 'rb') as f:
        model_content = f.read()
    return model_content


def generate_data(shape, batch, calib_dir):
    img_paths = [os.path.join(calib_dir, p) for p in os.listdir(calib_dir)]
    data = []
    for i in range(batch):
        assert i < len(img_paths), "calibration images not enough."
        img_data = Image.open(img_paths[i]).convert('RGB')
        img_data = img_data.resize((shape[3], shape[2]), Image.BILINEAR)
        img_data = np.asarray(img_data, dtype=np.uint8)
        img_data = np.transpose(img_data, (2, 0, 1))
        data.append([img_data[np.newaxis, ...]])
    return np.array(data)


def main():
    parser = argparse.ArgumentParser(prog="nncase")
    parser.add_argument("--target", default="k230",type=str, help='target to run,k230/cpu')
    parser.add_argument("--model",type=str, help='model file')
    parser.add_argument("--dataset_path", type=str, help='calibration_dataset')
    parser.add_argument("--input_width", type=int, default=320, help='model input_width')
    parser.add_argument("--input_height", type=int, default=320, help='model input_height')
    parser.add_argument("--ptq_option", type=int, default=0, help='ptq_option:0,1,2,3,4,5')

    args = parser.parse_args()

    # Align width and height to multiples of 32
    input_width = int(math.ceil(args.input_width / 32.0)) * 32
    input_height = int(math.ceil(args.input_height / 32.0)) * 32

    # Model input shape; dimensions must match input_layout
    input_shape=[1,3,input_height,input_width]

    dump_dir = 'tmp'
    if not os.path.exists(dump_dir):
        os.makedirs(dump_dir)

    # simplify ONNX
    model_file = onnx_simplify(args.model, dump_dir,input_shape)

    # Configure CompileOptions
    compile_options = nncase.CompileOptions()
    compile_options.target = args.target

    # Whether to use model-side preprocessing
    compile_options.preprocess = True
    # The ONNX model expects RGB, and K230 camera data is also RGB,
    # so RB swap is not required
    compile_options.swapRB = False
    # Input image shape
    compile_options.input_shape = input_shape
    # Input type: uint8 or float32
    compile_options.input_type = 'uint8'

    # Dequantized input range when input_type is uint8
    compile_options.input_range = [0, 1]
    # mean/std values for preprocessing; these come from the YOLOv8 source
    compile_options.mean = [0, 0, 0]
    compile_options.std = [1, 1, 1]

    # Set input layout; ONNX normally uses NCHW
    compile_options.input_layout = "NCHW"

    # Create Compiler instance
    compiler = nncase.Compiler(compile_options)

    # Import ONNX model
    model_content = read_model_file(model_file)
    import_options = nncase.ImportOptions()
    compiler.import_onnx(model_content, import_options)

    # Configure quantization method
    ptq_options = nncase.PTQTensorOptions()
    ptq_options.samples_count = 10

    if args.ptq_option == 0:
        ptq_options.calibrate_method = 'NoClip'
        ptq_options.quant_type = 'uint8'
        ptq_options.w_quant_type = 'uint8'
    elif args.ptq_option == 1:
        ptq_options.calibrate_method = 'NoClip'
        ptq_options.quant_type = 'uint8'
        ptq_options.w_quant_type = 'int16'
    elif args.ptq_option == 2:
        ptq_options.calibrate_method = 'NoClip'
        ptq_options.quant_type = 'int16'
        ptq_options.w_quant_type = 'uint8'
    elif args.ptq_option == 3:
        ptq_options.calibrate_method = 'Kld'
        ptq_options.quant_type = 'uint8'
        ptq_options.w_quant_type = 'uint8'
    elif args.ptq_option == 4:
        ptq_options.calibrate_method = 'Kld'
        ptq_options.quant_type = 'uint8'
        ptq_options.w_quant_type = 'int16'
    elif args.ptq_option == 5:
        ptq_options.calibrate_method = 'Kld'
        ptq_options.quant_type = 'int16'
        ptq_options.w_quant_type = 'uint8'
    else:
        pass

    # Set calibration data
    ptq_options.set_tensor_data(generate_data(input_shape, ptq_options.samples_count, args.dataset_path))
    compiler.use_ptq(ptq_options)

    # Start compilation
    compiler.compile()

    # Write kmodel file
    kmodel = compiler.gencode_tobytes()
    base,ext=os.path.splitext(args.model)
    kmodel_name=base+".kmodel"
    with open(kmodel_name, 'wb') as f:
        f.write(kmodel)


if __name__ == '__main__':
    main()

After model conversion succeeds, deploying it on the board requires C++ code that calls the nncase_runtime API.