[Bug rtl-optimization/96796] [9/10 Regression] aarch64: ICE during RTL pass: reload

Sat Apr 24 08:37:40 GMT 2021

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96796

--- Comment #10 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-10 branch has been updated by Richard Sandiford
<rsandifo@gcc.gnu.org>:

https://gcc.gnu.org/g:e720d3033f84802147d2da9e923bd862cdb73164

commit r10-9764-ge720d3033f84802147d2da9e923bd862cdb73164
Author: Richard Sandiford <richard.sandiford@arm.com>
Date:   Sat Apr 24 09:37:26 2021 +0100

    lra: Avoid cycling on certain subreg reloads [PR96796]

    This PR is about LRA cycling for a reload of the form:

   
----------------------------------------------------------------------------
    Changing pseudo 196 in operand 1 of insn 103 on equiv [r105:DI*0x8+r140:DI]
          Creating newreg=287, assigning class ALL_REGS to slow/invalid mem
r287
          Creating newreg=288, assigning class ALL_REGS to slow/invalid mem
r288
      103: r203:SI=r288:SI<<0x1+r196:DI#0
          REG_DEAD r196:DI
        Inserting slow/invalid mem reload before:
      316: r287:DI=[r105:DI*0x8+r140:DI]
      317: r288:SI=r287:DI#0
   
----------------------------------------------------------------------------

    The problem is with r287.  We rightly give it a broad starting class of
    POINTER_AND_FP_REGS (reduced from ALL_REGS by preferred_reload_class).
    However, we never make forward progress towards narrowing it down to
    a specific choice of class (POINTER_REGS or FP_REGS).

    I think in practice we rely on two things to narrow a reload pseudo's
    class down to a specific choice:

    (1) a restricted class is specified when the pseudo is created

        This happens for input address reloads, where the class is taken
        from the target's chosen base register class.  It also happens
        for simple REG reloads, where the class is taken from the chosen
        alternative's constraints.

    (2) uses of the reload pseudo as a direct input operand

        In this case get_reload_reg tries to reuse the existing register
        and narrow its class, instead of creating a new reload pseudo.

    However, neither occurs here.  As described above, r287 rightly
    starts out with a wide choice of class, ultimately derived from
    ALL_REGS, so we don't get (1).  And as the comments in the PR
    explain, r287 is never used as an input reload, only the subreg is,
    so we don't get (2):

   
----------------------------------------------------------------------------
             Choosing alt 13 in insn 317:  (0) r  (1) w {*movsi_aarch64}
          Creating newreg=291, assigning class FP_REGS to r291
      317: r288:SI=r291:SI
        Inserting insn reload before:
      320: r291:SI=r287:DI#0
   
----------------------------------------------------------------------------

    IMO, in this case we should rely on the reload of r316 to narrow
    down the class of r278.  Currently we do:

   
----------------------------------------------------------------------------
             Choosing alt 7 in insn 316:  (0) r  (1) m {*movdi_aarch64}
          Creating newreg=289 from oldreg=287, assigning class GENERAL_REGS to
r289
      316: r289:DI=[r105:DI*0x8+r140:DI]
        Inserting insn reload after:
      318: r287:DI=r289:DI
    ---------------------------------------------------

    i.e. we create a new pseudo register r289 and give *that* pseudo
    GENERAL_REGS instead.  This is because get_reload_reg only narrows
    down the existing class for OP_IN and OP_INOUT, not OP_OUT.

    But if we have a reload pseudo in a reload instruction and have chosen
    a specific class for the reload pseudo, I think we should simply install
    it for OP_OUT reloads too, if the class is a subset of the existing class.
    We will need to pick such a register whatever happens (for r289 in the
    example above).  And as explained in the PR, doing this actually avoids
    an unnecessary move via the FP registers too.

    This backport is less aggressive than the trunk version, in that the new
    code reuses the test for a reload move from in_class_p.  We will therefore
    only narrow OP_OUT classes if the instruction is a register move or memory
    load that was generated by LRA itself.

    gcc/
            PR rtl-optimization/96796
            * lra-constraints.c (in_class_p): Add a default-false
            allow_all_reload_class_changes_p parameter.  Do not treat
            reload moves specially when the parameter is true.
            (get_reload_reg): Try to narrow the class of an existing OP_OUT
            reload if we're reloading a reload pseudo in a reload instruction.

    gcc/testsuite/
            PR rtl-optimization/96796
            * gcc.c-torture/compile/pr96796.c: New test.