RVV Applications#

Overview#

RVV (RISC-V Vector) is the vector extension of the RISC-V ISA. K230 supports RVV and can use vector instructions for parallel computation, significantly improving data-processing performance.

Functional Description#

RVV Features#

RVV provides strong vector-compute capability:

SIMD-style computation
variable vector length
rich vector instructions for arithmetic, logic, load, and store
flexible data types, including integer and floating-point types

K230 RVV Support#

K230 provides:

vector length: 256-bit or 512-bit depending on the specific implementation
vector registers: 32 vector registers (v0 to v31)
data width: 8, 16, 32, and 64 bit
scalar types: integer and floating-point

Main Advantages#

Using RVV provides:

higher performance through parallel computation
more concise code for data-parallel workloads
better energy efficiency compared with pure scalar execution

Application Scenarios#

RVV is suitable for:

image processing
audio processing
matrix operations
data copy
cryptography
DSP applications

Build Notes#

Enable RVV Support#

Add RVV-related compiler flags:

CFLAGS += -march=rv64gcv -mabi=lp64d

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=rv64gcv -mabi=lp64d")

Inline Functions#

Include the RVV header in your source code:

#include <riscv_vector.h>

Usage#

Basic RVV Usage#

Vector configuration and load#

#include <riscv_vector.h>

void vector_add_example(float* a, float* b, float* c, int n) {
    size_t vl = vsetvl_e32m4(n);
    vfloat32m4_t va = vle32_v_f32m4(a, vl);
    vfloat32m4_t vb = vle32_v_f32m4(b, vl);
    vfloat32m4_t vc = vfadd_vv_f32m4(va, vb, vl);
    vse32_v_f32m4(c, vc, vl);
}

Vector-width setting#

RVV supports different LMUL widths:

vfloat32m1_t v1 = ...;
vfloat32m2_t v2 = ...;
vfloat32m4_t v4 = ...;
vfloat32m8_t v8 = ...;

Conditional handling#

Use vector masks for conditional processing:

void vector_conditional_example(float* a, float* b, float* c, int n) {
    size_t vl = vsetvl_e32m4(n);
    vfloat32m4_t va = vle32_v_f32m4(a, vl);
    vfloat32m4_t vb = vle32_v_f32m4(b, vl);
    vbool32_t mask = vmfgt_vf_f32m4(va, vb, vl);
    vfloat32m4_t vc = vfmerge_vfm_f32m4(vb, va, mask, vl);
    vse32_v_f32m4(c, vc, vl);
}

Reduction operations#

Use reduction operations for accumulation:

float vector_sum_example(float* a, int n) {
    size_t vl = vsetvl_e32m4(n);
    vfloat32m4_t va = vle32_v_f32m4(a, vl);
    float sum = vfredosum_vs_f32m4_f32m4(va, vfmv_s_f_f32m4(0.0f, vl), vl);
    return sum;
}

Performance Optimization Suggestions#

use the largest practical vector length (LMUL) for the workload
keep memory aligned to improve load/store efficiency
combine RVV with loop unrolling where appropriate
reduce scalar fallback code whenever possible

Tip

RVV programming has a learning curve. It is recommended to start from simple examples and gradually become familiar with the available RVV instructions and usage patterns. For details, refer to the RISC-V vector extension specification.

Tip

On K230, RVV can significantly improve performance in scenarios such as image processing and audio processing. For best results, optimize it together with K230 hardware features such as DMA and cache behavior.