[PATCH] [PR rtl-optimization/97249]Simplify vec_select of paradoxical subreg.

Mon Oct 19 15:31:15 GMT 2020

Hongtao Liu <crazylht@gmail.com> writes:
> On Thu, Oct 15, 2020 at 8:38 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> > +       /* Simplify vec_select of a subreg of X to just a vec_select of X
>> > +          when X has same component mode as vec_select.  */
>> > +       int l2;
>> > +       if (GET_CODE (trueop0) == SUBREG
>> > +           && GET_MODE_INNER (mode)
>> > +              == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))
>>
>> Better to use SUBREG_REG here and below.
>>
>
> Yes and changed.
>
>> > +           && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
>> > +           && (GET_MODE_NUNITS (mode)).is_constant (&l1)
>> > +           && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0))))
>> > +               .is_constant (&l2)
>> > +           && known_le (l1, l2))
>> > +         {
>> > +           unsigned HOST_WIDE_INT subreg_offset = 0;
>> > +           gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
>> > +           gcc_assert (can_div_trunc_p (exact_div (subreg_lsb (trueop0), BITS_PER_UNIT),
>> > +                                        GET_MODE_SIZE (GET_MODE_INNER (mode)),
>> > +                                        &subreg_offset));
>>
>> can_div_trunc_p discards the remainder, whereas it looks like here
>> you want an exact multiple.
>>
>> I don't think it's absolutely guaranteed that the “if” condition makes
>> the division by GET_MODE_SIZE exact.  E.g. in principle you could have
>> a subreg of a vector of TIs in which the subreg offset is misaligned by
>> a DI offset.
>>
>> I'm not sure the subreg_lsb conversion is correct though.  On big-endian
>> targets, lane numbering follows memory layout, just like subreg byte
>> offsets do.  So ISTM that using SUBREG_BYTE (as per the earlier patch)
>> was correct.
>>
>> In summary, I think the "if” condition should include something like:
>>
>>   constant_mulitple_p (SUBREG_BYTE (trueop0),
>>                        GET_MODE_UNIT_BITSIZE (mode),
>>                        &subreg_offset)
>>
>
> Changed.
>
>> Thanks,
>> Richard
>
>
> Update patch.
>
> -- 
> BR,
> Hongtao
>
> From 8d154067963e453c337e6dc2c4f3f19bf0d6e11b Mon Sep 17 00:00:00 2001
> From: liuhongt <hongtao.liu@intel.com>
> Date: Tue, 13 Oct 2020 15:35:29 +0800
> Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of
>  X.
>
> gcc/ChangeLog
> 	PR rtl-optimization/97249
> 	* simplify-rtx.c (simplify_binary_operation_1): Simplify
> 	vec_select of a subreg of X to a vec_select of X.
>
> gcc/testsuite/ChangeLog
>
> 	* gcc.target/i386/pr97249-1.c: New test.
> ---
>  gcc/simplify-rtx.c                        | 44 +++++++++++++++++++++++
>  gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 ++++++++++++++++
>  2 files changed, 74 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c
>
> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index 869f0d11b2e..b1009837b2b 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -4170,6 +4170,50 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
>  		    return subop1;
>  		}
>  	    }
> +
> +	  /* Simplify vec_select of a subreg of X to just a vec_select of X
> +	     when X has same component mode as vec_select.  */
> +	  int l2;
> +	  unsigned HOST_WIDE_INT subreg_offset = 0;
> +	  if (GET_CODE (trueop0) == SUBREG
> +	      && GET_MODE_INNER (mode)
> +		 == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0)))
> +	      && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)

Nothing really relies on this last line, and nothing uses l0, so better
to drop it.

> +	      && (GET_MODE_NUNITS (mode)).is_constant (&l1)
> +	      && (GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))))
> +		  .is_constant (&l2)
> +	      && known_le (l1, l2)

I'm not sure the last two &&s are really the important condition.
I think we should drop them for the suggestion below.

> +	      && constant_multiple_p (SUBREG_BYTE (trueop0),
> +				      GET_MODE_UNIT_BITSIZE (mode),
> +				      &subreg_offset))
> +	    {
> +

Excess blank line.

> +	      gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));

This can just use ==.

> +	      bool success = true;
> +	      for (int i = 0; i != l1; i++)
> +		{
> +		  rtx idx  = XVECEXP (trueop1, 0, i);

Excess space.

> +		  if (!CONST_INT_P (idx))

Here I think we should check:

		      || maybe_ge (UINTVAL (idx) + subreg_offset, nunits))

where:

       poly_uint64 nunits
         = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))).

This makes sure that all indices are in range.  In particular, it's
valid for the SUBREG_REG to be narrower than mode, for appropriate
vec_select indices

Thanks,
Richard