[PATCH] aarch64: Add extend-as-extract-with-shift pattern [PR96998]

Tue Sep 22 16:08:29 GMT 2020

Segher Boessenkool <segher@kernel.crashing.org> writes:
> Hi Alex,
>
> On Tue, Sep 22, 2020 at 08:40:07AM +0100, Alex Coplan wrote:
>> On 21/09/2020 18:35, Segher Boessenkool wrote:
>> Thanks for doing this testing. The results look good, then: no code size
>> changes and no build regressions.
>
> No *code* changes.  I cannot test aarch64 likme this.
>
>> > So, there is no difference for most targets (I checked some targets and
>> > there really is no difference).  The only exception is aarch64 (which
>> > the kernel calls "arm64"): the unpatched compiler ICEs!  (At least three
>> > times, even).
>> 
>> Indeed, this is the intended purpose of the patch, see the PR (96998).
>
> You want to fix a ICE in LRA caused by an instruction created by LRA,
> with a patch to combine?!  That doesn't sound right.
>
> If what you want to do is a) fix the backend bug, and then b) get some
> extra performance, then do *that*, and keep the patches separate.

This patch isn't supposed to be a performance optimisation.  It's supposed
to be a canonicalisation improvement.

The situation as things stand is that aarch64 has a bug: it accepts
an odd sign_extract representation of addresses, but doesn't accept
that same odd form of address as an LEA.  We have two options:

(a) add back instructions that recognise the odd form of LEA, or
(b) remove the code that accepts the odd addresses

I think (b) is the way to go here.  But doing that on its own
would regress code quality.  The reason we recognised the odd
addresses in the first place was because that was the rtl that
combine happened to generate for an important case.

Normal operating procedure is to add patterns and address-matching
code that accepts whatever combine happens to throw at the target,
regardless of how sensible the representation is.  But sometimes I think
we should instead think about whether the representation that combine is
using is the right one.  And IMO this is one such case.

At the moment combine does this:

Trying 8 -> 9:
    8: r98:DI=sign_extend(r92:SI)
      REG_DEAD r92:SI
    9: [r98:DI*0x4+r96:DI]=asm_operands
      REG_DEAD r98:DI
Failed to match this instruction:
(set (mem:SI (plus:DI (sign_extract:DI (mult:DI (subreg:DI (reg/v:SI 92 [ g ]) 0)
                    (const_int 4 [0x4]))
                (const_int 34 [0x22])
                (const_int 0 [0]))
            (reg/f:DI 96)) [3 *i_5+0 S4 A32])
    (asm_operands:SI ("") ("=Q") 0 []
         []
         [] /tmp/foo.c:13))
allowing combination of insns 8 and 9
original costs 4 + 4 = 8
replacement cost 4

and so that's one of the forms that the aarch64 address code accepts.
But a natural substitution would simply replace r98 with the rhs of
the set:

  (set (mem:SI (plus:DI (mult:DI (sign_extend:DI (reg/v:SI 92))
                                 (const_int 4))
                        (reg:DI 96)))
       ...)

The only reason we don't do that is because the substitution
and simplification go through the expand_compound_operation/
make_compound_operation process.

The corresponding (ashift ... (const_int 2)) *does* end up using
the natural sign_extend form rather than sign_extract form.
The only reason we get this (IMO) weird behaviour for mult is
the rule that shifts have to be mults in addresses.  Some code
(like the code being patched) instead expects ashift to be the
canonical form in all situations.

If we make the substitution work “naturally” for mult as well as
ashift, we can remove the addressing-matching code that has no
corresponding LEA pattern, and make the aarch64 address code
self-consistent that way instead.

Thanks,
Richard