[Bug rtl-optimization/89575] New: LRA for msp430 - Max. number of generated reload insns - frame pointer subreg simplification

Mon Mar 4 12:07:00 GMT 2019

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89575

            Bug ID: 89575
           Summary: LRA for msp430 - Max. number of generated reload insns
                    - frame pointer subreg simplification
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jozef.l at mittosystems dot com
  Target Milestone: ---

Created attachment 45881
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45881&action=edit
testcase

When enabling LRA for msp430, libgcc fails to build, specifically _muldi3.o.

> gcc -S tester.i -O2

> during RTL pass: reload
> ../../../../libgcc/libgcc2.c: In function '__muldi3':
> ../../../../libgcc/libgcc2.c:558:1: internal compiler error: Max. number of generated reload insns per insn is achieved (90)
> 
>       558 | }
>                       | ^
> 0xa2ab20 lra_constraints(bool)
>                               ../../gcc/lra-constraints.c:4875
> 0xa12b84 lra(_IO_FILE*)
>                               ../../gcc/lra.c:2461
> 0x9c68f1 do_reload
>                               ../../gcc/ira.c:5516
> 0x9c68f1 execute
>                               ../../gcc/ira.c:5700

The cycling reload occurs because IRA assigns hard register R4 (also
FRAME_POINTER_REGNUM, but not fixed for this use) to a pseudo reg, but when LRA
goes to simplify a subreg of the pseudo, it disallows simplification of this
subreg.

Specifically, simplify_subreg_regno (rtlanal.c):

> /* We shouldn't simplify stack-related registers.  */
> if ((!reload_completed || frame_pointer_needed)
>     && xregno == FRAME_POINTER_REGNUM)
>   return -1;

This is in an output reload, so a new set of mov insns are generated to load
the value back into the original, problematic pseudo of R4. Once again
simplify_subreg_regno is called to simplify the pseudo of R4, but it is
disallowed and the cycle continues.

From the IRA dump:

> Disposition:
>     0:r28  l0     8    2:r30  l0     4    1:r31  l0     4
> ...
> (insn 2 6 3 2 (set (subreg:HI (reg/v:DI 30 [ arg1 ]) 0)
>         (reg:HI 12 R12 [ arg1 ])) "tester.c":16:1 12 {movhi}
>      (expr_list:REG_DEAD (reg:HI 12 R12 [ arg1 ])
>         (nil)))

From the reload dump:

>     Creating newreg=37 from oldreg=30, assigning class NO_REGS to subreg reg r37
>   2: r37:DI#0=R12:HI
>   ...
>   Inserting subreg reload after:
>  42: r30:DI#0=r37:DI#0
>  ...
>     Creating newreg=38 from oldreg=30, assigning class NO_REGS to subreg reg r38
>  42: r38:DI#0=r37:DI#0
>  ...
>   Inserting subreg reload after:
>  52: r30:DI#0=r38:DI#0
And so on.

Is it OK to allow simplification of a subreg of FRAME_POINTER_REGNUM
when lra_in_progress is true? After all, constraints on the allocation of hard
regs shouldn't get more resitrictive as compilation progresses?
e.g.

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 3873b4098b0..9700928ff4e 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -3971,7 +3971,7 @@ simplify_subreg_regno (unsigned int xregno, machine_mode
xmode,
     return -1;

   /* We shouldn't simplify stack-related registers.  */
-  if ((!reload_completed || frame_pointer_needed)
+  if ((!(reload_completed || lra_in_progress) || frame_pointer_needed)
       && xregno == FRAME_POINTER_REGNUM)
     return -1;

This fixes the cycling reload for insn 2, as the frame pointer is not needed,
but there are further separate issues building the test case.

I've attached a reduced test case, and the IRA and reload dumps.

> gcc -v

> Target: msp430-elf
> Configured with: ../configure --target=msp430-elf --disable-nls --enable-languages=c,c++
> Thread model: single
> gcc version 9.0.1 20190301 (experimental) (GCC)