Note

This is the documentation for the latest development branch and may refer to features that are not available in released versions. If you are looking for the documentation for a specific release, use the drop-down menu on the left and select the desired version.

RVV Applications#

Overview#

RVV (RISC-V Vector) is the vector extension of the RISC-V ISA. K230 supports RVV and can use vector instructions for parallel computation, significantly improving data-processing performance.

Functional Description#

RVV Features#

RVV provides strong vector-compute capability:

  • SIMD-style computation

  • variable vector length

  • rich vector instructions for arithmetic, logic, load, and store

  • flexible data types, including integer and floating-point types

K230 RVV Support#

K230 provides:

  • vector length: 256-bit or 512-bit depending on the specific implementation

  • vector registers: 32 vector registers (v0 to v31)

  • data width: 8, 16, 32, and 64 bit

  • scalar types: integer and floating-point

Main Advantages#

Using RVV provides:

  • higher performance through parallel computation

  • more concise code for data-parallel workloads

  • better energy efficiency compared with pure scalar execution

Application Scenarios#

RVV is suitable for:

  • image processing

  • audio processing

  • matrix operations

  • data copy

  • cryptography

  • DSP applications

Build Notes#

Enable RVV Support#

Add RVV-related compiler flags:

CFLAGS += -march=rv64gcv -mabi=lp64d
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=rv64gcv -mabi=lp64d")

Inline Functions#

Include the RVV header in your source code:

#include <riscv_vector.h>

Usage#

Basic RVV Usage#

Vector configuration and load#

#include <riscv_vector.h>

void vector_add_example(float* a, float* b, float* c, int n) {
    size_t vl = vsetvl_e32m4(n);
    vfloat32m4_t va = vle32_v_f32m4(a, vl);
    vfloat32m4_t vb = vle32_v_f32m4(b, vl);
    vfloat32m4_t vc = vfadd_vv_f32m4(va, vb, vl);
    vse32_v_f32m4(c, vc, vl);
}

Vector-width setting#

RVV supports different LMUL widths:

vfloat32m1_t v1 = ...;
vfloat32m2_t v2 = ...;
vfloat32m4_t v4 = ...;
vfloat32m8_t v8 = ...;

Conditional handling#

Use vector masks for conditional processing:

void vector_conditional_example(float* a, float* b, float* c, int n) {
    size_t vl = vsetvl_e32m4(n);
    vfloat32m4_t va = vle32_v_f32m4(a, vl);
    vfloat32m4_t vb = vle32_v_f32m4(b, vl);
    vbool32_t mask = vmfgt_vf_f32m4(va, vb, vl);
    vfloat32m4_t vc = vfmerge_vfm_f32m4(vb, va, mask, vl);
    vse32_v_f32m4(c, vc, vl);
}

Reduction operations#

Use reduction operations for accumulation:

float vector_sum_example(float* a, int n) {
    size_t vl = vsetvl_e32m4(n);
    vfloat32m4_t va = vle32_v_f32m4(a, vl);
    float sum = vfredosum_vs_f32m4_f32m4(va, vfmv_s_f_f32m4(0.0f, vl), vl);
    return sum;
}

Performance Optimization Suggestions#

  1. use the largest practical vector length (LMUL) for the workload

  2. keep memory aligned to improve load/store efficiency

  3. combine RVV with loop unrolling where appropriate

  4. reduce scalar fallback code whenever possible

Tip

RVV programming has a learning curve. It is recommended to start from simple examples and gradually become familiar with the available RVV instructions and usage patterns. For details, refer to the RISC-V vector extension specification.

Tip

On K230, RVV can significantly improve performance in scenarios such as image processing and audio processing. For best results, optimize it together with K230 hardware features such as DMA and cache behavior.

Comments list
Comments
Log in