[PATCH V2] gcc: Add vec_select -> subreg RTL simplification

Thu Jul 15 13:06:58 GMT 2021

Ah, yes - those test results should have only been changed for little endian.

I've submitted a patch to the list restoring the original expected results for big
endian.

Thanks,
Jonathan
________________________________
From: Christophe Lyon <christophe.lyon.oss@gmail.com>
Sent: 15 July 2021 10:09
To: Richard Sandiford <Richard.Sandiford@arm.com>; Jonathan Wright <Jonathan.Wright@arm.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
Subject: Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification

On Mon, Jul 12, 2021 at 5:31 PM Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote:
Jonathan Wright <Jonathan.Wright@arm.com<mailto:Jonathan.Wright@arm.com>> writes:
> Hi,
>
> Version 2 of this patch adds more code generation tests to show the
> benefit of this RTL simplification as well as adding a new helper function
> 'rtx_vec_series_p' to reduce code duplication.
>
> Patch tested as version 1 - ok for master?

Sorry for the slow reply.

> Regression tested and bootstrapped on aarch64-none-linux-gnu,
> x86_64-unknown-linux-gnu, arm-none-linux-gnueabihf and
> aarch64_be-none-linux-gnu - no issues.

I've also tested this on powerpc64le-unknown-linux-gnu, no issues again.

> diff --git a/gcc/combine.c b/gcc/combine.c
> index 6476812a21268e28219d1e302ee1c979d528a6ca..0ff6ca87e4432cfeff1cae1dd219ea81ea0b73e4 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -6276,6 +6276,26 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, int in_dest,
>                             - 1,
>                             0));
>        break;
> +    case VEC_SELECT:
> +      {
> +     rtx trueop0 = XEXP (x, 0);
> +     mode = GET_MODE (trueop0);
> +     rtx trueop1 = XEXP (x, 1);
> +     int nunits;
> +     /* If we select a low-part subreg, return that.  */
> +     if (GET_MODE_NUNITS (mode).is_constant (&nunits)
> +         && targetm.can_change_mode_class (mode, GET_MODE (x), ALL_REGS))
> +       {
> +         int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0) : 0;
> +
> +         if (rtx_vec_series_p (trueop1, offset))
> +           {
> +             rtx new_rtx = lowpart_subreg (GET_MODE (x), trueop0, mode);
> +             if (new_rtx != NULL_RTX)
> +               return new_rtx;
> +           }
> +       }
> +      }

Since this occurs three times, I think it would be worth having
a new predicate:

/* Return true if, for all OP of mode OP_MODE:

     (vec_select:RESULT_MODE OP SEL)

   is equivalent to the lowpart RESULT_MODE of OP.  */

bool
vec_series_lowpart_p (machine_mode result_mode, machine_mode op_mode, rtx sel)

containing the GET_MODE_NUNITS (…).is_constant, can_change_mode_class
and rtx_vec_series_p tests.

I think the function belongs in rtlanal.[hc], even though subreg_lowpart_p
is in emit-rtl.c.

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index aef6da9732d45b3586bad5ba57dafa438374ac3c..f12a0bebd3d6dd3381ac8248cd3fa3f519115105 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1884,15 +1884,16 @@
>  )
>
>  (define_insn "*zero_extend<SHORT:mode><GPI:mode>2_aarch64"
> -  [(set (match_operand:GPI 0 "register_operand" "=r,r,w")
> -        (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" "r,m,m")))]
> +  [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r")
> +        (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" "r,m,m,w")))]
>    ""
>    "@
>     and\t%<GPI:w>0, %<GPI:w>1, <SHORT:short_mask>
>     ldr<SHORT:size>\t%w0, %1
> -   ldr\t%<SHORT:size>0, %1"
> -  [(set_attr "type" "logic_imm,load_4,f_loads")
> -   (set_attr "arch" "*,*,fp")]
> +   ldr\t%<SHORT:size>0, %1
> +   umov\t%w0, %1.<SHORT:size>[0]"
> +  [(set_attr "type" "logic_imm,load_4,f_loads,neon_to_gp")
> +   (set_attr "arch" "*,*,fp,fp")]

FTR (just to show I thought about it): I don't know whether the umov
can really be considered an fp operation rather than a simd operation,
but since we don't support fp without simd, this is already a distinction
without a difference.  So the pattern is IMO OK as-is.

> diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
> index 55b6c1ac585a4cae0789c3afc0fccfc05a6d3653..93e963696dad30f29a76025696670f8b31bf2c35 100644
> --- a/gcc/config/arm/vfp.md
> +++ b/gcc/config/arm/vfp.md
> @@ -224,7 +224,7 @@
>  ;; problems because small constants get converted into adds.
>  (define_insn "*arm_movsi_vfp"
>    [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,r,r,rk,m ,*t,r,*t,*t, *Uv")
> -      (match_operand:SI 1 "general_operand"     "rk, I,K,j,mi,rk,r,*t,*t,*Uvi,*t"))]
> +      (match_operand:SI 1 "general_operand"     "rk, I,K,j,mi,rk,r,t,*t,*Uvi,*t"))]
>    "TARGET_ARM && TARGET_HARD_FLOAT
>     && (   s_register_operand (operands[0], SImode)
>         || s_register_operand (operands[1], SImode))"

I'll assume that an Arm maintainer would have spoken up by now if
they didn't want this for some reason.

> diff --git a/gcc/rtl.c b/gcc/rtl.c
> index aaee882f5ca3e37b59c9829e41d0864070c170eb..3e8b3628b0b76b41889b77bb0019f582ee6f5aaa 100644
> --- a/gcc/rtl.c
> +++ b/gcc/rtl.c
> @@ -736,6 +736,19 @@ rtvec_all_equal_p (const_rtvec vec)
>      }
>  }
>
> +/* Return true if element-selection indices in VEC are in series.  */
> +
> +bool
> +rtx_vec_series_p (const_rtx vec, int start)

I think rtvec_series_p would be better, for consistency with
rtvec_all_equal_p.  Also, let's generalise it to:

/* Return true if VEC contains a linear series of integers
   { START, START+1, START+2, ... }.  */

bool
rtvec_series_p (rtvec vec, int start)
{
}

> +{
> +  for (int i = 0; i < XVECLEN (vec, 0); i++)
> +    {
> +      if (i + start != INTVAL (XVECEXP (vec, 0, i)))
> +     return false;
> +    }
> +  return true;

With the general definition I think this should be:

  for (int i = 0; i < GET_NUM_ELEM (vec); i++)
    {
      rtx x = RTVEC_ELT (vec, i);
      if (!CONST_INT_P (x) || INTVAL (x) != i + start)
        return false;
    }

Then pass XVEC (sel, 0) to the function, instead of just sel.

OK with those changes, thanks.

Hi,

Some of the updated tests fail on aarch64_be:
     gcc.target/aarch64/sve/extract_1.c scan-assembler-times \\tfmov\\tw[0-9]+, s[0-9]\\n 2
    gcc.target/aarch64/sve/extract_1.c scan-assembler-times \\tfmov\\tx[0-9]+, d[0-9]\\n 2
    gcc.target/aarch64/sve/extract_2.c scan-assembler-times \\tfmov\\tw[0-9]+, s[0-9]\\n 2
    gcc.target/aarch64/sve/extract_2.c scan-assembler-times \\tfmov\\tx[0-9]+, d[0-9]\\n 2
    gcc.target/aarch64/sve/extract_3.c scan-assembler-times \\tfmov\\tw[0-9]+, s[0-9]\\n 5
    gcc.target/aarch64/sve/extract_3.c scan-assembler-times \\tfmov\\tx[0-9]+, d[0-9]\\n 5
    gcc.target/aarch64/sve/extract_4.c scan-assembler-times \\tfmov\\tw[0-9]+, s[0-9]\\n 6
    gcc.target/aarch64/sve/extract_4.c scan-assembler-times \\tfmov\\tx[0-9]+, d[0-9]\\n 6

Can you check?

Thanks,

Christophe

Richard