[RFA][PR target/113666] Simplify VEC_EXTRACT from a uniform vector
Andrew Pinski
andrew.pinski@oss.qualcomm.com
Mon Jan 12 21:58:27 GMT 2026
On Mon, Jan 12, 2026 at 11:58 AM Jeffrey Law
<jeffrey.law@oss.qualcomm.com> wrote:
>
> This fixes a P3 regression relative to gcc-13 on the RISC-V platform for this code:
>
> unsigned char a;
>
> int main() {
> short b = a = 0;
> for (; a != 19; a++)
> if (a)
> b = 32872 >> a;
>
> if (b == 0)
> return 0;
> else
> return 1;
> }
>
> -march=rv64gcv_zvl256b -mabi=lp64d -O3 -ftree-vectorize
>
>
> Doesn't need vector at all. Good code generation here looks like:
>
>
> lui a5,%hi(a)
> li a4,19
> sb a4,%lo(a)(a5)
> li a0,0
> ret
>
>
> gcc-14 and gcc-15 produce horrific code here, roughly 20 instructions, over half of which are vector. It's not even worth posting, it's atrocious.
>
> The trunk improves things, but not quite to the quality of gcc-13:
>
> vsetivli zero,8,e16,mf2,ta,ma
> vmv.v.i v1,0
> lui a5,%hi(a)
> li a4,19
> vslidedown.vi v1,v1,1
> sb a4,%lo(a)(a5)
> vmv.x.s a0,v1
> snez a0,a0
> ret
>
>
> If we look at the .optimized dump we have this nugget:
>
> _26 = .VEC_EXTRACT ({ 0, 0, 0, 0, 0, 0, 0, 0 }, 1);
>
>
> If we're extracting an element out of a uniform vector, then any element will do and it's conveniently returned by uniform_vector_p. So with a simple match.pd pattern that simplifies to _26 = 0. That in turn allows elimination of all the vector code and simplify the return value to a constant as well, resulting in the desired code shown earlier.
>
> One could easily argue that this need not be restricted to a uniform vector and I would totally agree. But given we're in stage4, the minimal fix for the regression seems more appropriate. But I could certainly be convinced to handle the more general case here.
>
> Bootstrapped and regression tested on x86 & riscv64. Tested across the cross configurations as well with no regressions.
>
>
> OK for the trunk?
I think this should be closer to what I did for VEC_SHL_INSERT in
r16-4742-gfcde4c81644aec (this was based on the review where I tried
to do a similar thing as you did:
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658275.html ).
That is, in fold_const_call (in fold-const.cc) add VEC_EXTRACT case
and do similar as fold_const_vec_shl_insert.
Something like:
```
static tree
fold_const_vec_extract (tree, tree arg0, tree arg1)
{
if (TREE_CODE (arg0) != VECTOR_CST)
return NULL_TREE;
/* vec_extract ( dup(CST), N) -> CST. */
if (tree elem = uniform_vector_p (arg0))
return elem;
return NULL_TREE;
}
```
And also in match.pd add:
```
(simplify
(IFN_VEC_EXTRACT (vec_duplicate @0) @1)
@0)
```
I don't think we need to care about the bounds check on @1 either
because then it would be undefined anyways.
Thanks,
Andrew
>
> Jeff
>
>
>
>
More information about the Gcc-patches
mailing list