LCM will hoist vsetvl of bb 2 into bb 1.
We don't do AVL propagation for this situation since it's complicated that
we should analyze the code sequence between vsetvli in bb 1 and RVV insn in bb 2.
They are not necessary the consecutive blocks.
This patch is doing the optimizations after LCM, we will check and eliminate the vsetvli
in LCM inserted edge if such vsetvli is redundant. Such approach is much simplier and safe.
code:
void
foo2 (int32_t *a, int32_t *b, int n)
{
if (n <= 0)
return;
int i = n;
size_t vl = __riscv_vsetvl_e32m1 (i);
for (; i >= 0; i--)
{
vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl);
__riscv_vse32_v_i32m1 (b, v, vl);
if (i >= vl)
continue;
if (i == 0)
return;
vl = __riscv_vsetvl_e32m1 (i);
}
}
Before this patch:
foo2:
.LFB2:
.cfi_startproc
ble a2,zero,.L1
mv a4,a2
li a3,-1
vsetvli a5,a2,e32,m1,ta,mu
vsetvli zero,a5,e32,m1,ta,ma <- can be eliminated.
.L5:
vle32.v v1,0(a0)
vse32.v v1,0(a1)
bgeu a4,a5,.L3
.L10:
beq a2,zero,.L1
vsetvli a5,a4,e32,m1,ta,mu
addi a4,a4,-1
vsetvli zero,a5,e32,m1,ta,ma <- can be eliminated.
vle32.v v1,0(a0)
vse32.v v1,0(a1)
addiw a2,a2,-1
bltu a4,a5,.L10
.L3:
addiw a2,a2,-1
addi a4,a4,-1
bne a2,a3,.L5
.L1:
ret
After this patch:
f:
ble a2,zero,.L1
mv a4,a2
li a3,-1
vsetvli a5,a2,e32,m1,ta,ma
.L5:
vle32.v v1,0(a0)
vse32.v v1,0(a1)
bgeu a4,a5,.L3
.L10:
beq a2,zero,.L1
vsetvli a5,a4,e32,m1,ta,ma
addi a4,a4,-1
vle32.v v1,0(a0)
vse32.v v1,0(a1)
addiw a2,a2,-1
bltu a4,a5,.L10
.L3:
addiw a2,a2,-1
addi a4,a4,-1
bne a2,a3,.L5
.L1:
ret
PR target/109743
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (pass_vsetvl::get_vsetvl_at_end): New.
(local_avl_compatible_p): New.
(pass_vsetvl::local_eliminate_vsetvl_insn): Enhance local optimizations
for LCM, rewrite as a backward algorithm.
(pass_vsetvl::cleanup_insns): Use new local_eliminate_vsetvl_insn
interface, handle a BB at once.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/pr109743-1.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr109743-2.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr109743-3.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr109743-4.c: New test.