The IRA/LRA fix (revision 264897) for PR86939 and PR87479 has caused build issues on several targets. This bugzilla entry will be used to track the change(s) required to get the targets building again. The affected targets seem to be: aarch64, alpha, arm, hppa, s390x and sh4.
Pasting some edited commentary from the gcc-patches mailing list: On 10/8/18 9:52 AM, Jeff Law wrote: > My tester is showing a variety of problems as well. hppa, sh4, aarch64, > aarch64_be, alpha arm* and s390x bootstraps are all failing at various > points. Others may trickle in over time, but clearly something went > wrong recently. I'm going to start trying to pick them off after the > morning meetings are wrapped up. I looked into the PA-RISC test case issue Jeff sent me that is caused by my patch that handles conflicts wrt reg copies and I _think_ I have a handle on what the problem is, but don't yet know how to fix. Jeff, I agree with the analysis you gave in your email to me, but to get Vlad up to speed, here it is again along with some additional info. Compiling the test case, we have the following RTL during IRA. I have also annotated the pseudos in the RTL to include their hard reg assignment: (insn 30 19 32 3 (set (reg/f:SI 112 "%r19") ....)) (insn 32 30 20 3 (set (reg:SI 26 %r26) (reg/f:SI 101 "%r26"))) [snip] (insn 23 22 49 3 (set (reg:SI 109 "%r28") (mem:SI (reg/f:SI 101 "%r26")))) (insn 49 23 31 3 (set (reg/f:SI 100 "%r28") (if_then_else:SI (eq (reg:SI 109 "%r28") (const_int 15 [0xf])) (reg/f:SI 101 "%r26") (const_int 0 [0]))) (expr_list:REG_DEAD (reg:SI 109 "%r28") (expr_list:REG_DEAD (reg/f:SI 101 "%r26")))) (insn 31 49 33 3 (set (mem/f:SI (reg/f:SI 112 "%r19")) (reg/f:SI 100 "%r28")) (expr_list:REG_DEAD (reg/f:SI 112 "%r19") (expr_list:REG_DEAD (reg/f:SI 100 "%r28")))) (call_insn 33 31 36 3 (parallel [ (call (mem:SI (symbol_ref/v:SI ("@_Z3yowP11dw_cfi_node")) (const_int 16 [0x10])) (clobber (reg:SI 1 %r1)) (clobber (reg:SI 2 %r2)) (use (const_int 0 [0]))]) (expr_list:REG_DEAD (reg:SI 26 %r26)) (expr_list:SI (use (reg:SI 26 %r26))))) Before my patch, hard reg %r26 and pseudo 101 conflicted, but with my patch they now (correctly) do not. IRA is able to assign hard regs to all of the pseudos, but the constraints in the pattern for insn 49 forces op0 (pseudo 100) and op1 (pseudo 101) to have the same hard reg. They are not, so reload comes along and spills insn 49 with the following reloads: Reloads for insn # 49 Reload 0: reload_in (SI) = (reg/f:SI 26 %r26 [orig:101 _10 ] [101]) reload_out (SI) = (reg/f:SI 28 %r28 [orig:100 iftmp.2_5 ] [100]) GENERAL_REGS, RELOAD_OTHER (opnum = 0) reload_in_reg: (reg/f:SI 26 %r26 [orig:101 _10 ] [101]) reload_out_reg: (reg/f:SI 28 %r28 [orig:100 iftmp.2_5 ] [100]) reload_reg_rtx: (reg/f:SI 26 %r26 [orig:101 _10 ] [101]) ..giving us the following updated insn: (insn 49 23 58 3 (set (reg/f:SI 26 %r26 [101]) (if_then_else:SI (eq (reg:SI 28 %r28 [109]) (const_int 15)) (reg/f:SI 26 %r26 [101]) (const_int 0 [0])))) The problem is, that hard reg %r26 is defined in insn 32, to be used in insn 33, so using %r26 as the reload reg is wrong, because it will clobber the value we set in insn 32. HPPA is a reload (not LRA) target and looking thru reload, it assumes that if any input pseudo dies in an insn, then the hard reg assigned to the pseudo is free game for use with an output reload. With my patch, that assumption is (now) wrong, because another simultaneously live pseudo may be using that same hard reg or in this case, the hard reg itself is still live. Given no one wants to patch reload, the decision has been made to disable the special reg copy handling my patch introduced whenever we're compiling for a reload target via the following patch: Index: gcc/ira-lives.c =================================================================== --- gcc/ira-lives.c (revision 264897) +++ gcc/ira-lives.c (working copy) @@ -1064,6 +1064,11 @@ find_call_crossed_cheap_reg (rtx_insn *i rtx non_conflicting_reg_copy_p (rtx_insn *insn) { + /* Reload has issues with overlapping pseudos being assigned to the + same hard register, so don't allow it. See PR87600 for details. */ + if (!targetm.lra_p ()) + return NULL_RTX; + rtx set = single_set (insn); /* Disallow anything other than a simple register to register copy
Author: bergner Date: Fri Oct 12 16:31:11 2018 New Revision: 265113 URL: https://gcc.gnu.org/viewcvs?rev=265113&root=gcc&view=rev Log: PR rtl-optimization/87600 * ira-lives (non_conflicting_reg_copy_p): Disable for non LRA targets. Modified: trunk/gcc/ChangeLog trunk/gcc/ira-lives.c
Here is an example for aarch64: === long f(long x) { register long a asm("x0"); asm("bla %0 %1" : "+&r"(a) : "r"(x)); return a; } === The first asm operand is a local register variable set to x0, so this one should end up in x0. But "x" is passed in x0 as function argument, and then combined into the asm, making both asm operands hard register 0. And then LRA has the job of making things work (operand 0 is earlyclobber so not both args can be hard reg 0), and LRA picks the wrong solution: bla x1 x0 Not letting combine combine the register moves that copy from the function argument registers into pseudos fixes this. But it costs 1%-5% of code size on all targets (most are about 2%): most targets can usefully combine register moves into other instructions, for example many targets have operations that set a flags register as side effect. I'm going to try to disallow combining the hard reg -> pseudo moves, because that should help with the register alloc problems (it also gives better register alloc!), but at the same time introducing an extra copy (from pseudo to a new pseudo). In theory this should be the best of both worlds.
Oh btw, the #c3 problem isn't new at all, it happens with 4.8 already for example.
Author: segher Date: Mon Oct 22 20:23:39 2018 New Revision: 265398 URL: https://gcc.gnu.org/viewcvs?rev=265398&root=gcc&view=rev Log: combine: Do not combine moves from hard registers On most targets every function starts with moves from the parameter passing (hard) registers into pseudos. Similarly, after every call there is a move from the return register into a pseudo. These moves usually combine with later instructions (leaving pretty much the same instruction, just with a hard reg instead of a pseudo). This isn't a good idea. Register allocation can get rid of unnecessary moves just fine, and moving the parameter passing registers into many later instructions tends to prevent good register allocation. This patch disallows combining moves from a hard (non-fixed) register. This also avoid the problem mentioned in PR87600 #c3 (combining hard registers into inline assembler is problematic). Because the register move can often be combined with other instructions *itself*, for example for setting some condition code, this patch adds extra copies via new pseudos after every copy-from-hard-reg. On some targets this reduces average code size. On others it increases it a bit, 0.1% or 0.2% or so. (I tested this on all *-linux targets). PR rtl-optimization/87600 * combine.c: Add include of expr.h. (cant_combine_insn_p): Do not combine moves from any hard non-fixed register to a pseudo. (make_more_copies): New function, add a copy to a new pseudo after the moves from hard registers into pseudos. (rest_of_handle_combine): Declare rebuild_jump_labels_after_combine later. Call make_more_copies. Modified: trunk/gcc/ChangeLog trunk/gcc/combine.c
Author: bergner Date: Thu Nov 8 22:39:45 2018 New Revision: 265942 URL: https://gcc.gnu.org/viewcvs?rev=265942&root=gcc&view=rev Log: gcc/ PR rtl-optimization/87600 * cfgexpand.c (expand_asm_stmt): Catch illegal asm constraint usage. * lra-constraints.c (process_alt_operands): Skip illegal hard register usage. Prefer reloading non hard register operands. gcc/testsuite/ PR rtl-optimization/87600 * gcc.dg/pr87600.h: New file. * gcc.dg/pr87600-1.c: New test. * gcc.dg/pr87600-2.c: Likewise. Added: trunk/gcc/testsuite/gcc.dg/pr87600-1.c trunk/gcc/testsuite/gcc.dg/pr87600-2.c trunk/gcc/testsuite/gcc.dg/pr87600.h Modified: trunk/gcc/ChangeLog trunk/gcc/cfgexpand.c trunk/gcc/lra-constraints.c trunk/gcc/testsuite/ChangeLog
Ok, I think this resolves all but one ARM issue which is being tracked in PR87899, so I'm going to mark this bugzilla as fixed.
I've noticed regressions caused by make_more_copies, in scenarios that used subreg s for the low part of promoted incoming parms. With hard regs, the substitution into a subreg became a reg, but with a pseudo, it remains a subreg, which prevents further combines in some cases, as in e.g. gcc_target/powerpc/20050603-3.c on ppc64. I thought one way to go about it could be to scan for subregs of pseudos copied from hard regs before introducing the additional copies, and introduce the intermediate pseudo with the widest subreg mode if there aren't uses of the full pseudo.
(In reply to Alexandre Oliva from comment #8) > I've noticed regressions caused by make_more_copies, in scenarios that used > subreg s for the low part of promoted incoming parms. With hard regs, the > substitution into a subreg became a reg, but with a pseudo, it remains a > subreg, which prevents further combines in some cases, as in e.g. > gcc_target/powerpc/20050603-3.c on ppc64. That happens because of the zero_extract problems. The only thing make_more_copies has to do with this is that this change exposes this problem on this testcase. > I thought one way to go about it could be to scan for subregs of pseudos > copied from hard regs before introducing the additional copies, and > introduce the intermediate pseudo with the widest subreg mode if there > aren't uses of the full pseudo. Usually we want the *smallest* possible mode for pseudos. This is important to get good optimisation.
*** Bug 46164 has been marked as a duplicate of this bug. ***
(In reply to Segher Boessenkool from comment #4) > Oh btw, the #c3 problem isn't new at all, it happens with 4.8 already for > example. Yes it was originally reported as PR 46164 against GCC 4.5.1 even.