[PATCH V2] gcc: Add vec_select -> subreg RTL simplification
Jonathan Wright
Jonathan.Wright@arm.com
Thu Jul 15 13:06:58 GMT 2021
Ah, yes - those test results should have only been changed for little endian.
I've submitted a patch to the list restoring the original expected results for big
endian.
Thanks,
Jonathan
________________________________
From: Christophe Lyon <christophe.lyon.oss@gmail.com>
Sent: 15 July 2021 10:09
To: Richard Sandiford <Richard.Sandiford@arm.com>; Jonathan Wright <Jonathan.Wright@arm.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
Subject: Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification
On Mon, Jul 12, 2021 at 5:31 PM Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote:
Jonathan Wright <Jonathan.Wright@arm.com<mailto:Jonathan.Wright@arm.com>> writes:
> Hi,
>
> Version 2 of this patch adds more code generation tests to show the
> benefit of this RTL simplification as well as adding a new helper function
> 'rtx_vec_series_p' to reduce code duplication.
>
> Patch tested as version 1 - ok for master?
Sorry for the slow reply.
> Regression tested and bootstrapped on aarch64-none-linux-gnu,
> x86_64-unknown-linux-gnu, arm-none-linux-gnueabihf and
> aarch64_be-none-linux-gnu - no issues.
I've also tested this on powerpc64le-unknown-linux-gnu, no issues again.
> diff --git a/gcc/combine.c b/gcc/combine.c
> index 6476812a21268e28219d1e302ee1c979d528a6ca..0ff6ca87e4432cfeff1cae1dd219ea81ea0b73e4 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -6276,6 +6276,26 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, int in_dest,
> - 1,
> 0));
> break;
> + case VEC_SELECT:
> + {
> + rtx trueop0 = XEXP (x, 0);
> + mode = GET_MODE (trueop0);
> + rtx trueop1 = XEXP (x, 1);
> + int nunits;
> + /* If we select a low-part subreg, return that. */
> + if (GET_MODE_NUNITS (mode).is_constant (&nunits)
> + && targetm.can_change_mode_class (mode, GET_MODE (x), ALL_REGS))
> + {
> + int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0) : 0;
> +
> + if (rtx_vec_series_p (trueop1, offset))
> + {
> + rtx new_rtx = lowpart_subreg (GET_MODE (x), trueop0, mode);
> + if (new_rtx != NULL_RTX)
> + return new_rtx;
> + }
> + }
> + }
Since this occurs three times, I think it would be worth having
a new predicate:
/* Return true if, for all OP of mode OP_MODE:
(vec_select:RESULT_MODE OP SEL)
is equivalent to the lowpart RESULT_MODE of OP. */
bool
vec_series_lowpart_p (machine_mode result_mode, machine_mode op_mode, rtx sel)
containing the GET_MODE_NUNITS (…).is_constant, can_change_mode_class
and rtx_vec_series_p tests.
I think the function belongs in rtlanal.[hc], even though subreg_lowpart_p
is in emit-rtl.c.
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index aef6da9732d45b3586bad5ba57dafa438374ac3c..f12a0bebd3d6dd3381ac8248cd3fa3f519115105 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1884,15 +1884,16 @@
> )
>
> (define_insn "*zero_extend<SHORT:mode><GPI:mode>2_aarch64"
> - [(set (match_operand:GPI 0 "register_operand" "=r,r,w")
> - (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" "r,m,m")))]
> + [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r")
> + (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" "r,m,m,w")))]
> ""
> "@
> and\t%<GPI:w>0, %<GPI:w>1, <SHORT:short_mask>
> ldr<SHORT:size>\t%w0, %1
> - ldr\t%<SHORT:size>0, %1"
> - [(set_attr "type" "logic_imm,load_4,f_loads")
> - (set_attr "arch" "*,*,fp")]
> + ldr\t%<SHORT:size>0, %1
> + umov\t%w0, %1.<SHORT:size>[0]"
> + [(set_attr "type" "logic_imm,load_4,f_loads,neon_to_gp")
> + (set_attr "arch" "*,*,fp,fp")]
FTR (just to show I thought about it): I don't know whether the umov
can really be considered an fp operation rather than a simd operation,
but since we don't support fp without simd, this is already a distinction
without a difference. So the pattern is IMO OK as-is.
> diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
> index 55b6c1ac585a4cae0789c3afc0fccfc05a6d3653..93e963696dad30f29a76025696670f8b31bf2c35 100644
> --- a/gcc/config/arm/vfp.md
> +++ b/gcc/config/arm/vfp.md
> @@ -224,7 +224,7 @@
> ;; problems because small constants get converted into adds.
> (define_insn "*arm_movsi_vfp"
> [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,r,r,rk,m ,*t,r,*t,*t, *Uv")
> - (match_operand:SI 1 "general_operand" "rk, I,K,j,mi,rk,r,*t,*t,*Uvi,*t"))]
> + (match_operand:SI 1 "general_operand" "rk, I,K,j,mi,rk,r,t,*t,*Uvi,*t"))]
> "TARGET_ARM && TARGET_HARD_FLOAT
> && ( s_register_operand (operands[0], SImode)
> || s_register_operand (operands[1], SImode))"
I'll assume that an Arm maintainer would have spoken up by now if
they didn't want this for some reason.
> diff --git a/gcc/rtl.c b/gcc/rtl.c
> index aaee882f5ca3e37b59c9829e41d0864070c170eb..3e8b3628b0b76b41889b77bb0019f582ee6f5aaa 100644
> --- a/gcc/rtl.c
> +++ b/gcc/rtl.c
> @@ -736,6 +736,19 @@ rtvec_all_equal_p (const_rtvec vec)
> }
> }
>
> +/* Return true if element-selection indices in VEC are in series. */
> +
> +bool
> +rtx_vec_series_p (const_rtx vec, int start)
I think rtvec_series_p would be better, for consistency with
rtvec_all_equal_p. Also, let's generalise it to:
/* Return true if VEC contains a linear series of integers
{ START, START+1, START+2, ... }. */
bool
rtvec_series_p (rtvec vec, int start)
{
}
> +{
> + for (int i = 0; i < XVECLEN (vec, 0); i++)
> + {
> + if (i + start != INTVAL (XVECEXP (vec, 0, i)))
> + return false;
> + }
> + return true;
With the general definition I think this should be:
for (int i = 0; i < GET_NUM_ELEM (vec); i++)
{
rtx x = RTVEC_ELT (vec, i);
if (!CONST_INT_P (x) || INTVAL (x) != i + start)
return false;
}
Then pass XVEC (sel, 0) to the function, instead of just sel.
OK with those changes, thanks.
Hi,
Some of the updated tests fail on aarch64_be:
gcc.target/aarch64/sve/extract_1.c scan-assembler-times \\tfmov\\tw[0-9]+, s[0-9]\\n 2
gcc.target/aarch64/sve/extract_1.c scan-assembler-times \\tfmov\\tx[0-9]+, d[0-9]\\n 2
gcc.target/aarch64/sve/extract_2.c scan-assembler-times \\tfmov\\tw[0-9]+, s[0-9]\\n 2
gcc.target/aarch64/sve/extract_2.c scan-assembler-times \\tfmov\\tx[0-9]+, d[0-9]\\n 2
gcc.target/aarch64/sve/extract_3.c scan-assembler-times \\tfmov\\tw[0-9]+, s[0-9]\\n 5
gcc.target/aarch64/sve/extract_3.c scan-assembler-times \\tfmov\\tx[0-9]+, d[0-9]\\n 5
gcc.target/aarch64/sve/extract_4.c scan-assembler-times \\tfmov\\tw[0-9]+, s[0-9]\\n 6
gcc.target/aarch64/sve/extract_4.c scan-assembler-times \\tfmov\\tx[0-9]+, d[0-9]\\n 6
Can you check?
Thanks,
Christophe
Richard
More information about the Gcc-patches
mailing list