Note

This is the documentation for the latest development branch and may refer to features that are not available in released versions. If you are looking for the documentation for a specific release, use the drop-down menu on the left and select the desired version.

NNCASE Guide#

Overview#

What nncase Is#

nncase is a neural-network compiler designed for AI accelerators. It currently supports targets such as CPU, K210, K510, and K230.

nncase provides:

  • support for multi-input and multi-output networks

  • support for multi-branch structures

  • static memory allocation without heap dependency

  • operator fusion and optimization

  • float and uint8/int8 quantized inference

  • post-training quantization using floating-point models and calibration datasets

  • flat models with zero-copy loading support

Supported neural-network model formats:

  • TFLite

  • ONNX

nncase Architecture#

nncase architecture

The nncase software stack contains two parts:

  • compiler

  • runtime

Compiler is used on the PC side to compile a neural-network model and generate the final kmodel. Its main modules include:

  • Importer: imports models from other neural-network frameworks

  • IR: the intermediate representation, including neutral IR and target IR

  • Evaluator: provides interpreted execution for IR and is often used for constant folding and PTQ calibration

  • Transform: performs IR transformation and graph-traversal optimization

  • Quantize: performs post-training quantization, inserts quantize/dequantize nodes, collects tensor ranges, and removes unnecessary quantize/dequantize operations

  • Tiling: splits large computation to fit limited NPU memory and optimize latency/bandwidth tradeoffs

  • Partition: partitions the graph by module type so each subgraph can map to a different runtime module and device

  • Schedule: generates execution order and allocates buffers according to data dependencies

  • Codegen: generates runtime modules for each partitioned subgraph

Runtime is integrated into the user application and provides:

  • kmodel loading

  • input-data setup

  • KPU execution

  • output retrieval

Development Environment#

Operating System#

Supported host systems include:

  • Ubuntu 18.04

  • Ubuntu 20.04

  • Windows 10

  • Windows 11

Software Environment#

No.

Software

Version

1

Python

3.6/3.7/3.8/3.9/3.10

2

pip

>=20.3

3

numpy

1.19.5

4

onnx

1.9.0

5

onnx-simplifier

0.3.6

6

onnxoptimizer

0.2.6

7

onnxruntime

1.8.0

8

dotnet-runtime

7.0

Operator Support#

TFLite Operators#

Operator

Is Supported

ABS

Yes

ADD

Yes

ARG_MAX

Yes

ARG_MIN

Yes

AVERAGE_POOL_2D

Yes

BATCH_MATMUL

Yes

CAST

Yes

CEIL

Yes

CONCATENATION

Yes

CONV_2D

Yes

COS

Yes

CUSTOM

Yes

DEPTHWISE_CONV_2D

Yes

DIV

Yes

EQUAL

Yes

EXP

Yes

EXPAND_DIMS

Yes

FLOOR

Yes

FLOOR_DIV

Yes

FLOOR_MOD

Yes

FULLY_CONNECTED

Yes

GREATER

Yes

GREATER_EQUAL

Yes

L2_NORMALIZATION

Yes

LEAKY_RELU

Yes

LESS

Yes

LESS_EQUAL

Yes

LOG

Yes

LOGISTIC

Yes

MAX_POOL_2D

Yes

MAXIMUM

Yes

MEAN

Yes

MINIMUM

Yes

MUL

Yes

NEG

Yes

NOT_EQUAL

Yes

PAD

Yes

PADV2

Yes

MIRROR_PAD

Yes

PACK

Yes

POW

Yes

REDUCE_MAX

Yes

REDUCE_MIN

Yes

REDUCE_PROD

Yes

RELU

Yes

PRELU

Yes

RELU6

Yes

RESHAPE

Yes

RESIZE_BILINEAR

Yes

RESIZE_NEAREST_NEIGHBOR

Yes

ROUND

Yes

RSQRT

Yes

SHAPE

Yes

SIN

Yes

SLICE

Yes

SOFTMAX

Yes

SPACE_TO_BATCH_ND

Yes

SQUEEZE

Yes

BATCH_TO_SPACE_ND

Yes

STRIDED_SLICE

Yes

SQRT

Yes

SQUARE

Yes

SUB

Yes

SUM

Yes

TANH

Yes

TILE

Yes

TRANSPOSE

Yes

TRANSPOSE_CONV

Yes

QUANTIZE

Yes

FAKE_QUANT

Yes

DEQUANTIZE

Yes

GATHER

Yes

GATHER_ND

Yes

ONE_HOT

Yes

SQUARED_DIFFERENCE

Yes

LOG_SOFTMAX

Yes

SPLIT

Yes

HARD_SWISH

Yes

ONNX Operators#

Operator

Is Supported

Abs

Yes

Acos

Yes

Acosh

Yes

And

Yes

ArgMax

Yes

ArgMin

Yes

Asin

Yes

Asinh

Yes

Add

Yes

AveragePool

Yes

BatchNormalization

Yes

Cast

Yes

Ceil

Yes

Celu

Yes

Clip

Yes

Compress

Yes

Concat

Yes

Constant

Yes

ConstantOfShape

Yes

Conv

Yes

ConvTranspose

Yes

Cos

Yes

Cosh

Yes

CumSum

Yes

DepthToSpace

Yes

DequantizeLinear

Yes

Div

Yes

Dropout

Yes

Elu

Yes

Exp

Yes

Expand

Yes

Equal

Yes

Erf

Yes

Flatten

Yes

Floor

Yes

Gather

Yes

GatherElements

Yes

GatherND

Yes

Gemm

Yes

GlobalAveragePool

Yes

GlobalMaxPool

Yes

Greater

Yes

GreaterOrEqual

Yes

GRU

Yes

Hardmax

Yes

HardSigmoid

Yes

HardSwish

Yes

Identity

Yes

InstanceNormalization

Yes

LayerNormalization

Yes

LpNormalization

Yes

LeakyRelu

Yes

Less

Yes

LessOrEqual

Yes

Log

Yes

LogSoftmax

Yes

LRN

Yes

LSTM

Yes

MatMul

Yes

MaxPool

Yes

Max

Yes

Min

Yes

Mul

Yes

Neg

Yes

Not

Yes

OneHot

Yes

Pad

Yes

Pow

Yes

PRelu

Yes

QuantizeLinear

Yes

RandomNormal

Yes

RandomNormalLike

Yes

RandomUniform

Yes

RandomUniformLike

Yes

ReduceL1

Yes

ReduceL2

Yes

ReduceLogSum

Yes

ReduceLogSumExp

Yes

ReduceMax

Yes

ReduceMean

Yes

ReduceMin

Yes

ReduceProd

Yes

ReduceSum

Yes

ReduceSumSquare

Yes

Relu

Yes

Reshape

Yes

Resize

Yes

ReverseSequence

Yes

RoiAlign

Yes

Round

Yes

Rsqrt

Yes

Selu

Yes

Shape

Yes

Sign

Yes

Sin

Yes

Sinh

Yes

Sigmoid

Yes

Size

Yes

Slice

Yes

Softmax

Yes

Softplus

Yes

Softsign

Yes

SpaceToDepth

Yes

Split

Yes

Sqrt

Yes

Squeeze

Yes

Sub

Yes

Sum

Yes

Tanh

Yes

Tile

Yes

TopK

Yes

Transpose

Yes

Trilu

Yes

ThresholdedRelu

Yes

Upsample

Yes

Unsqueeze

Yes

Where

Yes

API Documentation#

The nncase stack contains both compiler and runtime, used for model conversion and KPU-side inference respectively. Python and C++ APIs are provided for both parts. Refer to:

nncase API documentation

Usage Steps#

Environment Setup#

  • Linux

First install dotnet-sdk-7.0 and set the dotnet environment variable. Do not install dotnet inside an Anaconda virtual environment:

sudo apt-get update
sudo apt-get install dotnet-sdk-7.0
export DOTNET_ROOT=/usr/share/dotnet

Then install nncase and nncase-kpu:

pip install nncase nncase-kpu
  • Windows

First install dotnet-sdk-7.0. For the installation steps, see the Microsoft documentation:

Install .NET on Windows

Then install nncase, download the matching nncase_kpu-2.x.x-py2.py3-none-win_amd64.whl package from Release, and install it locally:

pip install nncase
pip install nncase_kpu-2.x.x-py2.py3-none-win_amd64.whl
  • Docker

If you do not have an Ubuntu environment, use the nncase Docker image (Ubuntu 20.04 + Python 3.8 + dotnet-7.0):

cd /path/to/nncase_sdk
docker pull ghcr.io/kendryte/k230_sdk
docker run -it --rm -v `pwd`:/mnt -w /mnt ghcr.io/kendryte/k230_sdk /bin/bash -c "/bin/bash"
  • Check Version

root@469e6a4a9e71:/mnt# python3
Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import _nncase
>>> print(_nncase.__version__)
2.9.0

Model Conversion#

The nncase user guide is available here:

github: user_guide

When converting tflite/onnx models into kmodel, the key is to configure the options according to your own deployment needs. The main configuration objects are:

  • CompileOptions

  • PTQTensorOptions

  • ImportOptions

CompileOptions#

CompileOptions is used to configure nncase compilation behavior.

Field

Type

Required

Description

target

string

Yes

compilation target, such as cpu or k230

dump_ir

bool

No

whether to dump IR, default False

dump_asm

bool

No

whether to dump assembly output, default False

dump_dir

string

No

dump directory used together with dump_ir or related options, default ""

input_file

string

No

parameter-file path for ONNX models larger than 2GB, default ""

preprocess

bool

No

whether to enable built-in preprocessing, default False; the following preprocessing-related fields only take effect when preprocess=True

input_type

string

No

input data type when preprocessing is enabled, default "float"; when preprocess=True, it must be "uint8" or "float32"

input_shape

list[int]

No

input shape when preprocessing is enabled, default []; required when preprocess=True

input_range

list[float]

No

floating-point range after dequantization when preprocessing is enabled; required when preprocess=True and input_type="uint8"

input_layout

string

No

input layout, default ""

swapRB

bool

No

whether to swap data on the channel dimension, default False

mean

list[float]

No

preprocessing normalization mean, default [0,0,0]

std

list[float]

No

preprocessing normalization std, default [1,1,1]

letterbox_value

float

No

fill value used by letterbox preprocessing, default 0

output_layout

string

No

output layout, default ""

shape_bucket_enable

bool

Yes

whether to enable ShapeBucket, default False; effective together with the related variable-shape flow

shape_bucket_range_info

Dict[str, [int, int]]

Yes

range of each variable shape dimension; the minimum value must be greater than or equal to 1

shape_bucket_segments_count

int

Yes

number of segments used to split the input variable range

shape_bucket_fix_var_map

Dict[str, int]

No

fixes a variable shape dimension to a specific value

Preprocessing Notes#

For preprocessing details, refer to:

nncase compile API preprocessing

Moving part of the preprocessing into the model can improve preprocessing efficiency during board-side inference. Supported preprocessing includes:

  • swapRB (RGB -> BGR or BGR -> RGB)

  • Transpose (NHWC -> NCHW or NCHW -> NHWC)

  • Normalization

  • Dequantize

For example, if an ONNX model expects RGB input while OpenCV reads images as BGR, the normal runtime preprocessing path would first convert BGR to RGB. During kmodel conversion, you can set swapRB=True so the converted kmodel already contains the RB-swap preprocessing step. The board-side preprocessing code can then skip that step.

PTQTensorOptions#

PTQTensorOptions configures nncase PTQ behavior:

Field

Type

Required

Description

samples_count

int

No

number of calibration samples used for quantization

calibrate_method

string

No

quantization method, such as NoClip or Kld, default Kld

finetune_weights_method

string

No

weight fine-tuning method, such as NoFineTuneWeights or UseSquant, default NoFineTuneWeights

quant_type

string

No

data quantization type; can be uint8, int8, or int16; quant_type and w_quant_type must not both be int16

w_quant_type

string

No

weight quantization type; can be uint8, int8, or int16; quant_type and w_quant_type must not both be int16

quant_scheme

string

No

path to the imported quantization-scheme file

quant_scheme_strict_mode

bool

No

whether to quantize strictly according to quant_scheme

export_quant_scheme

bool

No

whether to export the quantization-scheme file

export_weight_range_by_channel

bool

No

whether to export weight quantization parameters in per-channel form; setting this to True is recommended

For mixed quantization, see:

MixQuant Guide

For PTQ details, refer to:

nncase compile API PTQ options

If the converted kmodel quality is not good enough, you can adjust quant_type and w_quant_type, but these two parameters must not both be int16.

Calibration Dataset Setup#

Name

Type

Description

data

List[List[np.ndarray]]

loaded calibration data

Calibration data is configured through set_tensor_data, and the parameter type is List[List[np.ndarray]].

For example:

  • if the model has one input and uses 10 calibration samples, the data shape may be [10, 1, 3, 224, 224]

  • if the model has two inputs and uses 10 calibration samples, the data shape may be [[10, 1, 3, 224, 224], [10, 1, 3, 320, 320]]

ImportOptions#

ImportOptions is used to configure model import into the compiler. It supports both tflite and onnx.

Example:

# Read and import a TFLite model
model_content = read_model_file(model)
compiler.import_tflite(model_content, import_options)

# Read and import an ONNX model
model_content = read_model_file(model)
compiler.import_onnx(model_content, import_options)

YOLOv8 ONNX to kmodel Example#

import os
import argparse
import numpy as np
from PIL import Image
import onnxsim
import onnx
import nncase
import shutil
import math

def parse_model_input_output(model_file,input_shape):
    onnx_model = onnx.load(model_file)
    input_all = [node.name for node in onnx_model.graph.input]
    input_initializer = [node.name for node in onnx_model.graph.initializer]
    input_names = list(set(input_all) - set(input_initializer))
    input_tensors = [
        node for node in onnx_model.graph.input if node.name in input_names]

    # input
    inputs = []
    for _, e in enumerate(input_tensors):
        onnx_type = e.type.tensor_type
        input_dict = {}
        input_dict['name'] = e.name
        input_dict['dtype'] = onnx.mapping.TENSOR_TYPE_TO_NP_TYPE[onnx_type.elem_type]
        input_dict['shape'] = [(i.dim_value if i.dim_value != 0 else d) for i, d in zip(
            onnx_type.shape.dim, input_shape)]
        inputs.append(input_dict)

    return onnx_model, inputs


def onnx_simplify(model_file, dump_dir,input_shape):
    onnx_model, inputs = parse_model_input_output(model_file,input_shape)
    onnx_model = onnx.shape_inference.infer_shapes(onnx_model)
    input_shapes = {}
    for input in inputs:
        input_shapes[input['name']] = input['shape']

    onnx_model, check = onnxsim.simplify(onnx_model, input_shapes=input_shapes)
    assert check, "Simplified ONNX model could not be validated"

    model_file = os.path.join(dump_dir, 'simplified.onnx')
    onnx.save_model(onnx_model, model_file)
    return model_file


def read_model_file(model_file):
    with open(model_file, 'rb') as f:
        model_content = f.read()
    return model_content


def generate_data(shape, batch, calib_dir):
    img_paths = [os.path.join(calib_dir, p) for p in os.listdir(calib_dir)]
    data = []
    for i in range(batch):
        assert i < len(img_paths), "calibration images not enough."
        img_data = Image.open(img_paths[i]).convert('RGB')
        img_data = img_data.resize((shape[3], shape[2]), Image.BILINEAR)
        img_data = np.asarray(img_data, dtype=np.uint8)
        img_data = np.transpose(img_data, (2, 0, 1))
        data.append([img_data[np.newaxis, ...]])
    return np.array(data)


def main():
    parser = argparse.ArgumentParser(prog="nncase")
    parser.add_argument("--target", default="k230",type=str, help='target to run,k230/cpu')
    parser.add_argument("--model",type=str, help='model file')
    parser.add_argument("--dataset_path", type=str, help='calibration_dataset')
    parser.add_argument("--input_width", type=int, default=320, help='model input_width')
    parser.add_argument("--input_height", type=int, default=320, help='model input_height')
    parser.add_argument("--ptq_option", type=int, default=0, help='ptq_option:0,1,2,3,4,5')

    args = parser.parse_args()

    # Align width and height to multiples of 32
    input_width = int(math.ceil(args.input_width / 32.0)) * 32
    input_height = int(math.ceil(args.input_height / 32.0)) * 32

    # Model input shape; dimensions must match input_layout
    input_shape=[1,3,input_height,input_width]

    dump_dir = 'tmp'
    if not os.path.exists(dump_dir):
        os.makedirs(dump_dir)

    # simplify ONNX
    model_file = onnx_simplify(args.model, dump_dir,input_shape)

    # Configure CompileOptions
    compile_options = nncase.CompileOptions()
    compile_options.target = args.target

    # Whether to use model-side preprocessing
    compile_options.preprocess = True
    # The ONNX model expects RGB, and K230 camera data is also RGB,
    # so RB swap is not required
    compile_options.swapRB = False
    # Input image shape
    compile_options.input_shape = input_shape
    # Input type: uint8 or float32
    compile_options.input_type = 'uint8'

    # Dequantized input range when input_type is uint8
    compile_options.input_range = [0, 1]
    # mean/std values for preprocessing; these come from the YOLOv8 source
    compile_options.mean = [0, 0, 0]
    compile_options.std = [1, 1, 1]

    # Set input layout; ONNX normally uses NCHW
    compile_options.input_layout = "NCHW"

    # Create Compiler instance
    compiler = nncase.Compiler(compile_options)

    # Import ONNX model
    model_content = read_model_file(model_file)
    import_options = nncase.ImportOptions()
    compiler.import_onnx(model_content, import_options)

    # Configure quantization method
    ptq_options = nncase.PTQTensorOptions()
    ptq_options.samples_count = 10

    if args.ptq_option == 0:
        ptq_options.calibrate_method = 'NoClip'
        ptq_options.quant_type = 'uint8'
        ptq_options.w_quant_type = 'uint8'
    elif args.ptq_option == 1:
        ptq_options.calibrate_method = 'NoClip'
        ptq_options.quant_type = 'uint8'
        ptq_options.w_quant_type = 'int16'
    elif args.ptq_option == 2:
        ptq_options.calibrate_method = 'NoClip'
        ptq_options.quant_type = 'int16'
        ptq_options.w_quant_type = 'uint8'
    elif args.ptq_option == 3:
        ptq_options.calibrate_method = 'Kld'
        ptq_options.quant_type = 'uint8'
        ptq_options.w_quant_type = 'uint8'
    elif args.ptq_option == 4:
        ptq_options.calibrate_method = 'Kld'
        ptq_options.quant_type = 'uint8'
        ptq_options.w_quant_type = 'int16'
    elif args.ptq_option == 5:
        ptq_options.calibrate_method = 'Kld'
        ptq_options.quant_type = 'int16'
        ptq_options.w_quant_type = 'uint8'
    else:
        pass

    # Set calibration data
    ptq_options.set_tensor_data(generate_data(input_shape, ptq_options.samples_count, args.dataset_path))
    compiler.use_ptq(ptq_options)

    # Start compilation
    compiler.compile()

    # Write kmodel file
    kmodel = compiler.gencode_tobytes()
    base,ext=os.path.splitext(args.model)
    kmodel_name=base+".kmodel"
    with open(kmodel_name, 'wb') as f:
        f.write(kmodel)


if __name__ == '__main__':
    main()

After model conversion succeeds, deploying it on the board requires C++ code that calls the nncase_runtime API.

Comments list
Comments
Log in