# `nncase` Model Compilation API Manual

## Overview

`nncase` is a neural network compiler designed for AI accelerators. The API given in this document is `python API` used by users to convert the trained `TFLite` model or `ONNX` model into a model format that can be accelerated by `kpu`, that is, `kmodel`. Currently, the compiled model APIs support deep learning models in formats such as `TFLite/ONNX`. The API provided in this document is used to compile `kmodel` on the local PC, and is not the code to run on `k230`. For learning about `nncase`, please refer to: [nncase github repo](https://github.com/kendryte/nncase).

## API introduction

### CompileOptions

Description

CompileOptions class, used to configure nncase compilation options. Each attribute is described as follows:

| Property name | type | Is it necessary | Description |
| :-------------------------- | :-------------------: | :------: | ---------------------------------------------------------------------------------------------------------------------- |
| target | string | yes | Specify the compilation target, such as 'cpu', 'k230' |
| dump_ir | bool | no | Specify whether to dump IR, the default is False |
| dump_asm | bool | no | Specifies whether to dump asm assembly files, the default is False |
| dump_dir | string | no | After specifying dump_ir and other switches earlier, specify the dump directory here. The default is "" |
| input_file | string | no | When the ONNX model exceeds 2GB, it is used to specify the parameter file path. The default is "" |
| preprocess | bool | no | Whether to enable pre-processing, the default is False. The following parameters only take effect when `preprocess=True` |
| input_type | string | no | Specify the input data type when turning on pre-processing, the default is "float". When `preprocess` is `True`, it must be specified as "uint8" or "float32" |
| input_shape | list[int] | no | Specify the shape of the input data when turning on pre-processing, the default is []. When `preprocess` is `True`, it must be specified |
| input_range | list[float] | no | Specify the floating point number range after dequantization of the input data when preprocessing is enabled. The default is [ ]. Must be specified when `preprocess` is `True` and `input_type` is `uint8` |
| input_layout | string | no | Specify the layout of the input data, the default is "" |
| swapRB | bool | no | Whether to reverse the data in the `channel` dimension, the default is False |
| mean | list[float] | no | The mean value of preprocessing standardized parameters, the default is [0,0,0] |
| std | list[float] | no | Preprocessing standardized parameter variance, default is [1,1,1] |
| letterbox_value | float | no | Specify the filling value of the pre-processing letterbox, the default is 0 |
| output_layout | string | no | Specify the layout of the output data, the default is "" |
| shape_bucket_enable | bool | yes | Whether to enable the ShapeBucket function, the default is False. Effective at `dump_ir=True` |
| shape_bucket_range_info | Dict[str, [int, int]] | yes | The range of variables in each input shape dimension information, the minimum value must be greater than or equal to 1 |
| shape_bucket_segments_count | int | yes | The scope of the input variable is divided into several segments |
| shape_bucket_fix_var_map | Dict[str, int] | no | Fixed variables in shape dimension information to specific values |

#### Pre-processing process description

Currently, custom pre-processing sequence is not supported. You can select the required pre-processing parameters for configuration according to the following process diagram.

<div class="mermaid">
graph TD;
    NewInput("NewInput<br>(shape = input_shape<br>dtype = input_type)") -->a(input_layout != ' ')-.Y.->Transpose1["transpose"] -.->b("SwapRB == True")-.Y.->SwapRB["SwapRB"]-.->c("input_type != float32")-.Y.->Dequantize["Dequantize"]-.->d("input_HW != model_HW")-.Y.->LetterBox["LetterBox"] -.->e("std not empty<br>mean not empty")-.Y.->Normalization["Normalization"]-.->OldInput-->Model_body-->OldOutput-->f("output_layout != ' '")-.Y.->Transpose2["Transpose"]-.-> NewOutput;
    a--N-->b--N-->c--N-->d--N-->e--N-->OldInput; f--N-->NewOutput;
    subgraph origin_model
        OldInput; Model_body ; OldOutput;
    end
</div>

Parameter description:

1. `input_range` is the range of floating point numbers after dequantization when the input data type is fixed point.

a. The input data type is uint8, the range is [0,255], and `input_range` is [0,255]. The function of dequantization is only type conversion, converting the uint8 data into float32. The `mean` and `std` parameters are still specified according to the data of [0,255].

b. If the input data type is uint8, the range is [0,255], and `input_range` is [0,1], then inverse quantization will convert the fixed-point number into a floating-point number [0,1]. The `mean` and `std` parameters need to be specified according to the data of 0~1.

   <div class="mermaid">
    graph TD;
        NewInput_uint8("NewInput_uint8 <br>[input_type:uint8]") --input_range:0,255 -->dequantize_0["Dequantize"]--float range:0,255--> OldInput_float32
        NewInput_uint81("NewInput_uint8 <br>[input_type:uint8]") --input_range:0,1 -->dequantize_1["Dequantize"]--float range:0,1--> OldInput_float32
   </div>

1. `input_shape` is the shape of the input data, and the layout is `input_layout`. It now supports string (`"NHWC"`, `"NCHW"`) and index methods as `input_layout`, and supports non-4D data processing.
When `input_layout` is configured in string form, it indicates the layout of the input data; when `input_layout` is configured in index form, it indicates that the input data will be transposed according to the currently configured `input_layout`, that is, `input_layout` is the `perm` parameter of `Transpose`.

<div class="mermaid">
graph TD;
    subgraph B
        NewInput1("NewInput: 1,4,10") --"input_layout:"0,2,1""-->Transpose2("Transpose perm: 0,2,1") --> OldInput2("OldInput: 1,10,4");
    end
    subgraph A
        NewInput --"input_layout:"NHWC""--> Transpose0("Transpose: NHWC2NCHW") --> OldInput;
        NewInput("NewInput: 1,224,224,3 (NHWC)") --"input_layout:"0,3,1,2""--> Transpose1("Transpose perm: 0,3,1,2") --> OldInput("OldInput: 1,3,224,224 (NCHW)");
    end
</div>

​ The same goes for `output_layout`, as shown in the figure below.

<div class="mermaid">
graph TD;
subgraph B
    OldOutput1("OldOutput: 1,10,4,5,2") --"output_layout: "0,2,3,1,4""--> Transpose5("Transpose perm: 0,2,3,1,4") --> NewOutput1("NewOutput: 1,4,5,10,2");
    end
subgraph A
    OldOutput --"output_layout: "NHWC""--> Transpose3("Transpose: NCHW2NHWC") --> NewOutput("NewOutput<br>NHWC");
    OldOutput("OldOutput: (NCHW)") --"output_layout: "0,2,3,1""--> Transpose4("Transpose perm: 0,2,3,1") --> NewOutput("NewOutput<br>NHWC");
    end
</div>

#### Dynamic shape parameter description

ShapeBucket is a solution for dynamic shapes that optimizes dynamic shapes based on the input length range and the number of specified segments. This function defaults to false, and the corresponding option needs to be turned on to take effect. Except for specifying the corresponding field information, the other processes are no different from compiling a static model.

- ONNX

Some dimensions in the shape of the model are variable names. Here we take the input of an ONNX model as an example.

> tokens: int64[batch_size, tgt_seq_len]
> step: float32[seq_len, batch_size]

There are three variables `seq_len`, `tgt_seq_len`, and `batch_size` in the dimension information of shape.
The first is batch_size. Although it is a variable, it is fixed to 3 in actual application. Therefore, adding `batch_size = 3` to **fix_var_map** will fix this dimension to 3 during operation.
`seq_len` and `tgt_seq_len` will actually change, so you need to configure the actual range of these two variables, which is the **range_info** information. **segments_count** is the actual number of segments, which will be divided into several equal parts according to the range, and the corresponding compilation time will also increase several times accordingly.

The following are examples of corresponding compilation parameters:

```python
compile_options = nncase.CompileOptions()
compile_options.shape_bucket_enable = True
compile_options.shape_bucket_range_info = {"seq_len": [1, 100], "tgt_seq_len": [1, 100]}
compile_options.shape_bucket_segments_count = 2
compile_options.shape_bucket_fix_var_map = {"batch_size": 3}
```

- TFLite

TFLite's model is different from ONNX. The name of the dimension is not marked on the shape. Currently, only one dimension in the input is supported to be dynamic, and the name is uniformly configured as -1. The configuration method is as follows:

```cpp
compile_options = nncase.CompileOptions()
compile_options.shape_bucket_enable = True
compile_options.shape_bucket_range_info = {"-1":[1, 100]}
compile_options.shape_bucket_segments_count = 2
compile_options.shape_bucket_fix_var_map = {"batch_size" : 3}
```

After configuring these options, the entire compilation process is consistent with the static shape.

#### Parameter configuration example

Instantiate CompileOptions and configure the values ​​of each attribute.

```python
compile_options = nncase.CompileOptions()

compile_options.target = "cpu" #"k230"
compile_options.dump_ir = True  # if False, will not dump the compile-time result.
compile_options.dump_asm = True
compile_options.dump_dir = "dump_path"
compile_options.input_file = ""

# preprocess args
compile_options.preprocess = False
if compile_options.preprocess:
    compile_options.input_type = "uint8"  # "uint8" "float32"
    compile_options.input_shape = [1,224,320,3]
    compile_options.input_range = [0,1]
    compile_options.input_layout = "NHWC" # "NHWC" ”NCHW“
    compile_options.swapRB = False
    compile_options.mean = [0,0,0]
    compile_options.std = [1,1,1]
    compile_options.letterbox_value = 0
    compile_options.output_layout = "NHWC" # "NHWC" "NCHW"

# Dynamic shape args
compile_options.shape_bucket_enable = False
if compile_options.shape_bucket_enable:
    compile_options.shape_bucket_range_info = {"seq_len": [1, 100], "tgt_seq_len": [1, 100]}
    compile_options.shape_bucket_segments_count = 2
    compile_options.shape_bucket_fix_var_map = {"batch_size": 3}
```

### ImportOptions

Description

ImportOptions class, used to configure nncase import options.

definition

```python
class ImportOptions:
    def __init__(self) -> None:
        pass
```

Example

Instantiate ImportOptions and configure the values ​​of each attribute.

```python
#import_options
import_options = nncase.ImportOptions()
```

### PTQTensorOptions

Description

PTQTensorOptions class, used to configure nncase PTQ options.

| name | type | Is it necessary | Description |
| ------------------------------ | ------ | -------- | ---- |
| samples_count | int | no | Specifies the number of calibration sets used for quantification |
| calibrate_method | string | no | Specify the quantization method, optional 'NoClip', 'Kld', the default is 'Kld' |
| finetune_weights_method | string | no | Specify whether to fine-tune the weights, optional 'NoFineTuneWeights', 'UseSquant', the default is 'NoFineTuneWeights' |
| quant_type | string | no | Specify the data quantization type, optional 'uint8', 'int8', 'int16', `quant_type` and `w_quant_type` cannot be 'int16' at the same time |
| w_quant_type | string | no | Specify the weight quantization type, optional 'uint8', 'int8', 'int16', `quant_type` and `w_quant_type` cannot be 'int16' at the same time |
| quant_scheme | string | no | Path to import quantization parameter configuration file |
| quant_scheme_strict_mode | bool | no | Whether to perform quantization strictly according to quant_scheme |
| export_quant_scheme | bool | no | Whether to export the quantization parameter configuration file |
| export_weight_range_by_channel | bool | no | Whether to export the weights quantization parameter in the form of `bychannel`. It is recommended to set this parameter to `True`. |

For the specific usage process of mixed quantification, see [MixQuantInstructions](https://github.com/kendryte/nncase/blob/release/2.0/docs/MixQuant.md).

Example

```python
# ptq_options
ptq_options = nncase.PTQTensorOptions()
ptq_options.samples_count = 6
ptq_options.finetune_weights_method = "NoFineTuneWeights"
ptq_options.quant_type = "uint8"
ptq_options.w_quant_type = "uint8"
ptq_options.set_tensor_data(generate_data(input_shape, ptq_options.samples_count, args.dataset))

ptq_options.quant_scheme = ""
ptq_options.quant_scheme_strict_mode = False
ptq_options.export_quant_scheme = True
ptq_options.export_weight_range_by_channel = True

compiler.use_ptq(ptq_options)
```

### set_tensor_data

Description

Set tensor data and set calibration data during model conversion.

definition

```python
    def set_tensor_data(self, data: List[List[np.ndarray]]) -> None:
        reshape_data = list(map(list, zip(*data)))
        self.cali_data = [RuntimeTensor.from_numpy(
            d) for d in itertools.chain.from_iterable(reshape_data)]
```

Parameters

| name | type | Description |
| ---- | --------------------- | -------------- |
| data | List[List[np.ndarray]] | Read calibration data |

Return Value

None.

Example

```shell
# ptq_options
ptq_options = nncase.PTQTensorOptions()
ptq_options.samples_count = 6
ptq_options.set_tensor_data(generate_data(input_shape, ptq_options.samples_count, args.dataset))
compiler.use_ptq(ptq_options)
```

### Compiler

Description

Compiler class, used to compile neural network models.

definition

```python
class Compiler:
    _target: _nncase.Target
    _session: _nncase.CompileSession
    _compiler: _nncase.Compiler
    _compile_options: _nncase.CompileOptions
    _quantize_options: _nncase.QuantizeOptions
    _module: IRModule
```

### import_tflite

Description

Import TFLite model.

definition

```python
def import_tflite(self, model_content: bytes, options: ImportOptions) -> None:
    self._compile_options.input_format = "tflite"
    self._import_module(model_content)
```

Parameters

| name | type | Description |
| -------------- | ------------- | -------------- |
| model_content | byte\[\] | Read model content |
| import_options | ImportOptions | Import options |

Return Value

None.

Example

```python
model_content = read_model_file(model)
compiler.import_tflite(model_content, import_options)
```

### import_onnx

Description

Import the ONNX model.

definition

```python
def import_onnx(self, model_content: bytes, options: ImportOptions) -> None:
    self._compile_options.input_format = "onnx"
    self._import_module(model_content)
```

Parameters

| name | type | Description |
| -------------- | ------------- | -------------- |
| model_content | byte\[\] | Read model content |
| import_options | ImportOptions | Import options |

Return Value

None.

Example

```python
model_content = read_model_file(model)
compiler.import_onnx(model_content, import_options)
```

### use_ptq

Description

Set PTQ configuration options.

- K230 must use quantization by default.

definition

`use_ptq(ptq_options)`

Parameters

| name | type | Description |
| ----------- | ---------------- | ----------- |
| ptq_options | PTQTensorOptions | PTQ configuration options |

Return Value

None.

Example

`compiler.use_ptq(ptq_options)`

### compile

Description

Compile the neural network model.

definition

`compile()`

Parameters

None.

Return Value

None.

Example

`compiler.compile()`

### gencode_tobytes

Description

Generate kmodel byte stream.

definition

`gencode_tobytes()`

Parameters

None.

Return Value

`bytes[]`

Example

```python
kmodel = compiler.gencode_tobytes()
with open(os.path.join(infer_dir, 'test.kmodel'), 'wb') as f:
    f.write(kmodel)
```

## Example

The model and python compilation script used in the following examples:

- The original model file is located in the src/rtsmart/libs/nncase/examples/models directory
- The python compilation script is located in the src/rtsmart/libs/nncase/examples/scripts directory

### Compile TFLite model

The mbv2_tflite.py script is as follows:

```python
import os
import argparse
import numpy as np
from PIL import Image
import nncase

def read_model_file(model_file):
    with open(model_file, 'rb') as f:
        model_content = f.read()
    return model_content

def generate_data(shape, batch, calib_dir):
    img_paths = [os.path.join(calib_dir, p) for p in os.listdir(calib_dir)]
    data = []
    for i in range(batch):
        assert i < len(img_paths), "calibration images not enough."
        img_data = Image.open(img_paths[i]).convert('RGB')
        img_data = img_data.resize((shape[3], shape[2]), Image.BILINEAR)
        img_data = np.asarray(img_data, dtype=np.uint8)
        img_data = np.transpose(img_data, (2, 0, 1))
        data.append([img_data[np.newaxis, ...]])
    return data

def main():
    parser = argparse.ArgumentParser(prog="nncase")
    parser.add_argument("--target", type=str, help='target to run')
    parser.add_argument("--model", type=str, help='model file')
    parser.add_argument("--dataset", type=str, help='calibration_dataset')
    args = parser.parse_args()

    input_shape = [1, 3, 224, 224]
    dump_dir = 'tmp/mbv2_tflite'

    # compile_options
    compile_options = nncase.CompileOptions()
    compile_options.target = args.target
    compile_options.preprocess = True
    compile_options.swapRB = False
    compile_options.input_shape = input_shape
    compile_options.input_type = 'uint8'
    compile_options.input_range = [0, 255]
    compile_options.mean = [127.5, 127.5, 127.5]
    compile_options.std = [127.5, 127.5, 127.5]
    compile_options.input_layout = 'NCHW'
    compile_options.dump_ir = True
    compile_options.dump_asm = True
    compile_options.dump_dir = dump_dir

    # compiler
    compiler = nncase.Compiler(compile_options)

    # import
    model_content = read_model_file(args.model)
    import_options = nncase.ImportOptions()
    compiler.import_tflite(model_content, import_options)

    # ptq_options
    ptq_options = nncase.PTQTensorOptions()
    ptq_options.samples_count = 6
    ptq_options.set_tensor_data(generate_data(input_shape, ptq_options.samples_count, args.dataset))
    compiler.use_ptq(ptq_options)

    # compile
    compiler.compile()

    # kmodel
    kmodel = compiler.gencode_tobytes()
    with open(os.path.join(dump_dir, 'test.kmodel'), 'wb') as f:
        f.write(kmodel)

if __name__ == '__main__':
    main()
```

Execute the following command to compile the TFLite model of mobilenetv2, and the target is k230.

```sh
root@c285a41a7243:/mnt/# cd rtos_sdk/src/rtsmart/libs/nncase/examples
root@c285a41a7243:/mnt/rtos_sdk/src/rtsmart/libs/nncase/examples# python3 ./scripts/mbv2_tflite.py --target k230 --model models/mbv2.tflite --dataset calibration_dataset
```

### Compile ONNX model

For the ONNX model, it is recommended to use [ONNX Simplifier](https://github.com/daquexian/onnx-simplifier) to simplify it first, and then use nncase to compile.

The yolov5s_onnx.py script is as follows:

```python
import os
import argparse
import numpy as np
from PIL import Image
import onnxsim
import onnx
import nncase

def parse_model_input_output(model_file):
    onnx_model = onnx.load(model_file)
    input_all = [node.name for node in onnx_model.graph.input]
    input_initializer = [node.name for node in onnx_model.graph.initializer]
    input_names = list(set(input_all) - set(input_initializer))
    input_tensors = [
        node for node in onnx_model.graph.input if node.name in input_names]

    # input
    inputs = []
    for _, e in enumerate(input_tensors):
        onnx_type = e.type.tensor_type
        input_dict = {}
        input_dict['name'] = e.name
        input_dict['dtype'] = onnx.mapping.TENSOR_TYPE_TO_NP_TYPE[onnx_type.elem_type]
        input_dict['shape'] = [(i.dim_value if i.dim_value != 0 else d) for i, d in zip(
            onnx_type.shape.dim, [1, 3, 224, 224])]
        inputs.append(input_dict)

    return onnx_model, inputs


def onnx_simplify(model_file, dump_dir):
    onnx_model, inputs = parse_model_input_output(model_file)
    onnx_model = onnx.shape_inference.infer_shapes(onnx_model)
    input_shapes = {}
    for input in inputs:
        input_shapes[input['name']] = input['shape']

    onnx_model, check = onnxsim.simplify(onnx_model, input_shapes=input_shapes)
    assert check, "Simplified ONNX model could not be validated"

    model_file = os.path.join(dump_dir, 'simplified.onnx')
    onnx.save_model(onnx_model, model_file)
    return model_file


def read_model_file(model_file):
    with open(model_file, 'rb') as f:
        model_content = f.read()
    return model_content

def generate_data_ramdom(shape, batch):
    data = []
    for i in range(batch):
        data.append([np.random.randint(0, 256, shape).astype(np.uint8)])
    return data


def generate_data(shape, batch, calib_dir):
    img_paths = [os.path.join(calib_dir, p) for p in os.listdir(calib_dir)]
    data = []
    for i in range(batch):
        assert i < len(img_paths), "calibration images not enough."
        img_data = Image.open(img_paths[i]).convert('RGB')
        img_data = img_data.resize((shape[3], shape[2]), Image.BILINEAR)
        img_data = np.asarray(img_data, dtype=np.uint8)
        img_data = np.transpose(img_data, (2, 0, 1))
        data.append([img_data[np.newaxis, ...]])
    return data

def main():
    parser = argparse.ArgumentParser(prog="nncase")
    parser.add_argument("--target", type=str, help='target to run')
    parser.add_argument("--model", type=str, help='model file')
    parser.add_argument("--dataset", type=str, help='calibration_dataset')

    args = parser.parse_args()

    input_shape = [1, 3, 320, 320]

    dump_dir = 'tmp/yolov5s_onnx'
    if not os.path.exists(dump_dir):
        os.makedirs(dump_dir)

    # onnx simplify
    model_file = onnx_simplify(args.model, dump_dir)

    # compile_options
    compile_options = nncase.CompileOptions()
    compile_options.target = args.target
    compile_options.preprocess = True
    compile_options.swapRB = False
    compile_options.input_shape = input_shape
    compile_options.input_type = 'uint8'
    compile_options.input_range = [0, 255]
    compile_options.mean = [0, 0, 0]
    compile_options.std = [255, 255, 255]
    compile_options.input_layout = 'NCHW'
    compile_options.output_layout = 'NCHW'
    compile_options.dump_ir = True
    compile_options.dump_asm = True
    compile_options.dump_dir = dump_dir

    # compiler
    compiler = nncase.Compiler(compile_options)

    # import
    model_content = read_model_file(model_file)
    import_options = nncase.ImportOptions()
    compiler.import_onnx(model_content, import_options)

    # ptq_options
    ptq_options = nncase.PTQTensorOptions()
    ptq_options.samples_count = 6
    ptq_options.set_tensor_data(generate_data(input_shape, ptq_options.samples_count, args.dataset))
    compiler.use_ptq(ptq_options)

    # compile
    compiler.compile()

    # kmodel
    kmodel = compiler.gencode_tobytes()
    with open(os.path.join(dump_dir, 'test.kmodel'), 'wb') as f:
        f.write(kmodel)

if __name__ == '__main__':
    main()
```

Execute the following command to compile the ONNX model, and the target is k230.

```sh
root@c285a41a7243:/mnt/# cd rtos_sdk/src/rtsmart/libs/nncase/examples
root@c285a41a7243: /mnt/rtos_sdk/src/rtsmart/libs/nncase/examples # python3 ./scripts/yolov5s_onnx.py --target k230 --model models/yolov5s.onnx --dataset calibration_dataset
```
