Bug 87600 - Fix for PRs 86939 and 87479 causes build issues for several targets
Summary: Fix for PRs 86939 and 87479 causes build issues for several targets
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 9.0
: P3 normal
Target Milestone: 9.0
Assignee: Peter Bergner
URL: https://gcc.gnu.org/ml/gcc-patches/20...
Keywords:
: 46164 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-10-12 15:52 UTC by Peter Bergner
Modified: 2024-03-26 22:23 UTC (History)
8 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2018-10-12 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Bergner 2018-10-12 15:52:27 UTC
The IRA/LRA fix (revision 264897) for PR86939 and PR87479 has caused build issues on several targets.  This bugzilla entry will be used to track the change(s) required to get the targets building again.

The affected targets seem to be: aarch64, alpha, arm, hppa, s390x and sh4.
Comment 1 Peter Bergner 2018-10-12 16:08:28 UTC
Pasting some edited commentary from the gcc-patches mailing list:

On 10/8/18 9:52 AM, Jeff Law wrote:
> My tester is showing a variety of problems as well.  hppa, sh4, aarch64,
> aarch64_be, alpha arm* and s390x bootstraps are all failing at various
> points.   Others may trickle in over time, but clearly something went
> wrong recently.  I'm going to start trying to pick them off after the
> morning meetings are wrapped up.

I looked into the PA-RISC test case issue Jeff sent me that is caused by my
patch that handles conflicts wrt reg copies and I _think_ I have a handle
on what the problem is, but don't yet know how to fix.  Jeff, I agree with
the analysis you gave in your email to me, but to get Vlad up to speed, here
it is again along with some additional info.

Compiling the test case, we have the following RTL during IRA.  I have also
annotated the pseudos in the RTL to include their hard reg assignment:

(insn 30 19 32 3 (set (reg/f:SI 112 "%r19") ....))
(insn 32 30 20 3 (set (reg:SI 26 %r26)
                      (reg/f:SI 101 "%r26")))
[snip]
(insn 23 22 49 3 (set (reg:SI 109 "%r28")
                 (mem:SI (reg/f:SI 101 "%r26"))))
(insn 49 23 31 3 (set (reg/f:SI 100 "%r28")
        (if_then_else:SI (eq (reg:SI 109 "%r28") (const_int 15 [0xf]))
            (reg/f:SI 101 "%r26")
            (const_int 0 [0])))
     (expr_list:REG_DEAD (reg:SI 109 "%r28")
        (expr_list:REG_DEAD (reg/f:SI 101 "%r26"))))
(insn 31 49 33 3 (set (mem/f:SI (reg/f:SI 112 "%r19"))
                      (reg/f:SI 100 "%r28"))
     (expr_list:REG_DEAD (reg/f:SI 112 "%r19")
        (expr_list:REG_DEAD (reg/f:SI 100 "%r28"))))
(call_insn 33 31 36 3 (parallel [
            (call (mem:SI (symbol_ref/v:SI ("@_Z3yowP11dw_cfi_node"))
                (const_int 16 [0x10]))
            (clobber (reg:SI 1 %r1))
            (clobber (reg:SI 2 %r2))
            (use (const_int 0 [0]))])
     (expr_list:REG_DEAD (reg:SI 26 %r26))
    (expr_list:SI (use (reg:SI 26 %r26)))))

Before my patch, hard reg %r26 and pseudo 101 conflicted, but with my patch
they now (correctly) do not.  IRA is able to assign hard regs to all of the
pseudos, but the constraints in the pattern for insn 49 forces op0 (pseudo 100)
and op1 (pseudo 101) to have the same hard reg.  They are not, so reload comes
along and spills insn 49 with the following reloads:

Reloads for insn # 49
Reload 0: reload_in (SI) = (reg/f:SI 26 %r26 [orig:101 _10 ] [101])
          reload_out (SI) = (reg/f:SI 28 %r28 [orig:100 iftmp.2_5 ] [100])
          GENERAL_REGS, RELOAD_OTHER (opnum = 0)
          reload_in_reg: (reg/f:SI 26 %r26 [orig:101 _10 ] [101])
          reload_out_reg: (reg/f:SI 28 %r28 [orig:100 iftmp.2_5 ] [100])
          reload_reg_rtx: (reg/f:SI 26 %r26 [orig:101 _10 ] [101])

..giving us the following updated insn:

(insn 49 23 58 3 (set (reg/f:SI 26 %r26 [101])
        (if_then_else:SI (eq (reg:SI 28 %r28 [109])
                (const_int 15))
            (reg/f:SI 26 %r26 [101])
            (const_int 0 [0]))))

The problem is, that hard reg %r26 is defined in insn 32, to be used in
insn 33, so using %r26 as the reload reg is wrong, because it will clobber
the value we set in insn 32.  HPPA is a reload (not LRA) target and looking
thru reload, it assumes that if any input pseudo dies in an insn, then the
hard reg assigned to the pseudo is free game for use with an output reload.
With my patch, that assumption is (now) wrong, because another simultaneously
live pseudo may be using that same hard reg or in this case, the hard reg
itself is still live.

Given no one wants to patch reload, the decision has been made to disable
the special reg copy handling my patch introduced whenever we're compiling
for a reload target via the following patch:

Index: gcc/ira-lives.c
===================================================================
--- gcc/ira-lives.c     (revision 264897)
+++ gcc/ira-lives.c     (working copy)
@@ -1064,6 +1064,11 @@ find_call_crossed_cheap_reg (rtx_insn *i
 rtx
 non_conflicting_reg_copy_p (rtx_insn *insn)
 {
+  /* Reload has issues with overlapping pseudos being assigned to the
+     same hard register, so don't allow it.  See PR87600 for details.  */
+  if (!targetm.lra_p ())
+    return NULL_RTX;
+
   rtx set = single_set (insn);

   /* Disallow anything other than a simple register to register copy
Comment 2 Peter Bergner 2018-10-12 16:31:44 UTC
Author: bergner
Date: Fri Oct 12 16:31:11 2018
New Revision: 265113

URL: https://gcc.gnu.org/viewcvs?rev=265113&root=gcc&view=rev
Log:
	PR rtl-optimization/87600
	* ira-lives (non_conflicting_reg_copy_p): Disable for non LRA targets.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ira-lives.c
Comment 3 Segher Boessenkool 2018-10-14 21:30:53 UTC
Here is an example for aarch64:

===
long f(long x)
{
        register long a asm("x0");
        asm("bla %0 %1" : "+&r"(a) : "r"(x));
        return a;
}
===

The first asm operand is a local register variable set to x0, so this one
should end up in x0.  But "x" is passed in x0 as function argument, and then
combined into the asm, making both asm operands hard register 0.  And then LRA
has the job of making things work (operand 0 is earlyclobber so not both args
can be hard reg 0), and LRA picks the wrong solution:

        bla x1 x0

Not letting combine combine the register moves that copy from the function
argument registers into pseudos fixes this.  But it costs 1%-5% of code size
on all targets (most are about 2%): most targets can usefully combine register
moves into other instructions, for example many targets have operations that
set a flags register as side effect.

I'm going to try to disallow combining the hard reg -> pseudo moves, because
that should help with the register alloc problems (it also gives better
register alloc!), but at the same time introducing an extra copy (from pseudo
to a new pseudo).  In theory this should be the best of both worlds.
Comment 4 Segher Boessenkool 2018-10-14 22:22:39 UTC
Oh btw, the #c3 problem isn't new at all, it happens with 4.8 already for example.
Comment 5 Segher Boessenkool 2018-10-22 20:24:12 UTC
Author: segher
Date: Mon Oct 22 20:23:39 2018
New Revision: 265398

URL: https://gcc.gnu.org/viewcvs?rev=265398&root=gcc&view=rev
Log:
combine: Do not combine moves from hard registers

On most targets every function starts with moves from the parameter
passing (hard) registers into pseudos.  Similarly, after every call
there is a move from the return register into a pseudo.  These moves
usually combine with later instructions (leaving pretty much the same
instruction, just with a hard reg instead of a pseudo).

This isn't a good idea.  Register allocation can get rid of unnecessary
moves just fine, and moving the parameter passing registers into many
later instructions tends to prevent good register allocation.  This
patch disallows combining moves from a hard (non-fixed) register.

This also avoid the problem mentioned in PR87600 #c3 (combining hard
registers into inline assembler is problematic).

Because the register move can often be combined with other instructions
*itself*, for example for setting some condition code, this patch adds
extra copies via new pseudos after every copy-from-hard-reg.

On some targets this reduces average code size.  On others it increases
it a bit, 0.1% or 0.2% or so.  (I tested this on all *-linux targets).


	PR rtl-optimization/87600
	* combine.c: Add include of expr.h.
	(cant_combine_insn_p): Do not combine moves from any hard non-fixed
	register to a pseudo.
	(make_more_copies): New function, add a copy to a new pseudo after
	the moves from hard registers into pseudos.
	(rest_of_handle_combine): Declare rebuild_jump_labels_after_combine
	later.  Call make_more_copies.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/combine.c
Comment 6 Peter Bergner 2018-11-08 22:40:32 UTC
Author: bergner
Date: Thu Nov  8 22:39:45 2018
New Revision: 265942

URL: https://gcc.gnu.org/viewcvs?rev=265942&root=gcc&view=rev
Log:
gcc/
	PR rtl-optimization/87600
	* cfgexpand.c (expand_asm_stmt): Catch illegal asm constraint usage.
	* lra-constraints.c (process_alt_operands): Skip illegal hard
	register usage.  Prefer reloading non hard register operands.

gcc/testsuite/
	PR rtl-optimization/87600
	* gcc.dg/pr87600.h: New file.
	* gcc.dg/pr87600-1.c: New test.
	* gcc.dg/pr87600-2.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.dg/pr87600-1.c
    trunk/gcc/testsuite/gcc.dg/pr87600-2.c
    trunk/gcc/testsuite/gcc.dg/pr87600.h
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgexpand.c
    trunk/gcc/lra-constraints.c
    trunk/gcc/testsuite/ChangeLog
Comment 7 Peter Bergner 2018-11-08 22:45:55 UTC
Ok, I think this resolves all but one ARM issue which is being tracked in PR87899, so I'm going to mark this bugzilla as fixed.
Comment 8 Alexandre Oliva 2020-06-18 11:11:25 UTC
I've noticed regressions caused by make_more_copies, in scenarios that used subreg s for the low part of promoted incoming parms.  With hard regs, the substitution into a subreg became a reg, but with a pseudo, it remains a subreg, which prevents further combines in some cases, as in e.g. gcc_target/powerpc/20050603-3.c on ppc64.

I thought one way to go about it could be to scan for subregs of pseudos copied from hard regs before introducing the additional copies, and introduce the intermediate pseudo with the widest subreg mode if there aren't uses of the full pseudo.
Comment 9 Segher Boessenkool 2020-06-18 14:39:48 UTC
(In reply to Alexandre Oliva from comment #8)
> I've noticed regressions caused by make_more_copies, in scenarios that used
> subreg s for the low part of promoted incoming parms.  With hard regs, the
> substitution into a subreg became a reg, but with a pseudo, it remains a
> subreg, which prevents further combines in some cases, as in e.g.
> gcc_target/powerpc/20050603-3.c on ppc64.

That happens because of the zero_extract problems.  The only thing
make_more_copies has to do with this is that this change exposes this
problem on this testcase.

> I thought one way to go about it could be to scan for subregs of pseudos
> copied from hard regs before introducing the additional copies, and
> introduce the intermediate pseudo with the widest subreg mode if there
> aren't uses of the full pseudo.

Usually we want the *smallest* possible mode for pseudos.  This is
important to get good optimisation.
Comment 10 Andrew Pinski 2024-03-26 22:21:30 UTC
*** Bug 46164 has been marked as a duplicate of this bug. ***
Comment 11 Andrew Pinski 2024-03-26 22:23:58 UTC
(In reply to Segher Boessenkool from comment #4)
> Oh btw, the #c3 problem isn't new at all, it happens with 4.8 already for
> example.

Yes it was originally reported as PR 46164 against GCC 4.5.1 even.