Bug 111926 - RISC-V: Use vsetvl insn replace csrr vlenb insn
Summary: RISC-V: Use vsetvl insn replace csrr vlenb insn
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 14.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2023-10-23 03:36 UTC by Lehua Ding
Modified: 2023-11-13 00:11 UTC (History)
4 users (show)

See Also:
Host:
Target: RISC-V
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lehua Ding 2023-10-23 03:36:03 UTC
We can use: 
        vsetvl a5, zero, e8, mf8, ta, ta
replace:
        csrr    a4,vlenb
        srli    a4,a4,3

The reason for this is that the performance of the vsetvl instruction tends to be better optimised than the csrr instruction.

#include <riscv_vector.h>

#define exhaust_vector_regs()                                                  \
  asm volatile("#" ::                                                          \
		 : "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v9", \
		   "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17",     \
		   "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25",     \
		   "v26", "v27", "v28", "v29", "v30", "v31");
           
void
spill_1 (int8_t *in, int8_t *out)
{
  vint8mf8_t v1 = *(vint8mf8_t*)in;
  exhaust_vector_regs ();
  *(vint8mf8_t*)out = v1;
}

spill_1(signed char*, signed char*):
        csrr    a4,vlenb
        srli    a4,a4,3
        csrr    t0,vlenb
        slli    a3,a4,3
        sub     sp,sp,t0
        sub     a3,a3,a4
        add     a3,a3,sp
        vsetvli a5,zero,e8,mf8,ta,ma
        vle8.v  v1,0(a0)
        vse8.v  v1,0(a3)
        csrr    a4,vlenb
        srli    a4,a4,3
        slli    a3,a4,3
        sub     a3,a3,a4
        add     a3,a3,sp
        vle8.v  v1,0(a3)
        csrr    t0,vlenb
        vse8.v  v1,0(a1)
        add     sp,sp,t0
        jr      ra


https://godbolt.org/z/TcKxbjnoh
Comment 1 Kito Cheng 2023-10-23 04:24:03 UTC
Plz leave an option to let user has choice, performance things is hard to saw which is absolutely better for all uarch, my thought is leaving an option and let mtune and a command line option to control that.
Comment 2 Kito Cheng 2023-10-23 04:25:05 UTC
Forgot to mention, personally I love idea to simplify code gen, I could imagine that's definitely an optimization for specific uarch :)
Comment 3 Lehua Ding 2023-10-24 03:00:29 UTC
(In reply to Kito Cheng from comment #1)
> Plz leave an option to let user has choice, performance things is hard to
> saw which is absolutely better for all uarch, my thought is leaving an
> option and let mtune and a command line option to control that.

Understood, I'll put the possible optimisation points I came across here. How to do it is like you said still under consideration.