[PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

Richard Sandiford richard.sandiford@arm.com
Fri Aug 21 09:28:13 GMT 2020


Alex Coplan <Alex.Coplan@arm.com> writes:
> Hi Richard,
>
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: 18 August 2020 09:35
>> To: Alex Coplan <Alex.Coplan@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw <Richard.Earnshaw@arm.com>;
>> Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
>> <Kyrylo.Tkachov@arm.com>
>> Subject: Re: [PATCH] aarch64: Don't generate invalid zero/sign-extend
>> syntax
>> 
>> Alex Coplan <alex.coplan@arm.com> writes:
>> > Note that an obvious omission here is that this patch does not touch
>> the
>> > mult patterns such as *add_<optab><ALLX:mode>_mult_<GPI:mode>. I found
>> > that I couldn't hit these patterns with C code since multiplications by
>> > powers of two always get turned into shifts by earlier RTL passes. If
>> > there's a way to reliably hit these patterns, then perhaps these should
>> > be updated as well.
>> 
>> Hmm.  Feels like we should either update them or delete them.  E.g.:
>> 
>>   *adds_<optab><mode>_multp2
>>   *subs_<optab><mode>_multp2
>> 
>> were added alongside the adds3.c and subs3.c tests that you're updating,
>> so if the tests don't/no longer need the multp2 patterns to pass,
>> there's a good chance that the patterns are redundant.
>> 
>> For reasons I never understood, the canonical representation is to use
>> (mult …) for powers of 2 inside a (mem …) but shifts outside of (mem …)s.
>> So perhaps the patterns were originally for address calculations that had
>> been moved outside of a (mem …) and not updated to shifts instead of
>> mults.
>
> Thanks for the review, and for clarifying this. I tried removing these
> together with the *_mul_imm_ patterns (e.g. *adds_mul_imm_<mode>) and
> the only failure was gcc/testsuite/gcc.dg/torture/pr34330.c when
> compiled with -Os -ftree-vectorize which appears to depend on the
> *add_mul_imm_di pattern. Without this pattern, we ICE in LRA on this
> input.
>
> In this case, GCC appears to have done exactly what you described: we
> have the (plus (mult ...) ...) nodes inside (mem)s prior to register
> allocation, and then we end up pulling these out without converting them
> to shifts.
>
> Seeing this behaviour (and in particular seeing the ICE) makes me
> hesitant to just go ahead and remove the other patterns. That said, I
> have a patch to remove the following patterns:
>
>  *adds_<optab><mode>_multp2
>  *subs_<optab><mode>_multp2
>  *add_<optab><ALLX:mode>_mult_<GPI:mode>
>  *add_<optab><SHORT:mode>_mult_si_uxtw
>  *add_<optab><mode>_multp2
>  *add_<optab>si_multp2_uxtw
>  *add_uxt<mode>_multp2
>  *add_uxtsi_multp2_uxtw
>  *sub_<optab><mode>_multp2
>  *sub_<optab>si_multp2_uxtw
>  *sub_uxt<mode>_multp2
>  *sub_uxtsi_multp2_uxtw
>  *sub_uxtsi_multp2_uxtw
>
> (together with the predicate aarch64_pwr_imm3 which is only used in
> these patterns) and this bootstraps/regtests just fine.
>
> So, I have a couple of questions:
>
> (1) Should it be considered a bug if we pull (plus (mult (power of 2)
>    ...) ...) out of a (mem) RTX without re-writing the (mult) as a
>    shift?

IMO, yes.  But if we have an example in which it happens, we have
to fix it before removing the patterns.  That could end up being
a bit of a rabbit hole, and could affect other targets too.

If we keep the patterns, we should fix the [su]xtw problem in:

  *adds_<optab><mode>_multp2
  *subs_<optab><mode>_multp2
  *add_<optab><ALLX:mode>_mult_<GPI:mode>
  *add_uxt<mode>_multp2
  *sub_uxt<mode>_multp2

too.  (Plus any others I missed, if that isn't the full list.)

Thanks,
Richard


More information about the Gcc-patches mailing list