This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH] Don't combine across likely spilled hard reg setters (PR rtl-optimization/59477)
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Eric Botcazou <ebotcazou at adacore dot com>, Joern Rennecke <joern dot rennecke at embecosm dot com>, Jeff Law <law at redhat dot com>, Richard Biener <rguenther at suse dot de>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Wed, 15 Jan 2014 22:55:03 +0100
- Subject: [PATCH] Don't combine across likely spilled hard reg setters (PR rtl-optimization/59477)
- Authentication-results: sourceware.org; auth=none
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
Hi!
As discussed in the PR, when combine combines something across a setter
of a likely spilled non-fixed hard register, we may get RA ICE, as the
combined insn might need for reload a hard reg that is live at the point
where combiner has combined the insn.
The following patch attempts to fix that by not creating LOG_LINKS in
certain cases. There are two cases which the patch handles separately:
1) CALL_INSN and preceeding load_register_parameters sequence
(from the call insn up to the earliest insn in the sequence that
sets some likely spilled non-fixed hard register
2) other setters of likely spilled non-fixed hard register
(loading of return value of a function into the return register(s),
explicit register vars, ...)
1) is handled specially, by disallowing LOG_LINKS from before the first
insn into the sequence, because in between that first insn and
the CALL_INSN a likely spilled hard register is live. LOG_LINKS from
before the sequence to after the CALL_INSN or from within the sequence
to after the call or from within the sequence to within the sequence
are allowed
2) other setters are handled just like a basic block boundary for LOG_LINKS,
no LOG_LINKS are allowed to cross that boundary
Bootstrapped/regtested on x86_64-linux and i686-linux.
For i686-linux stage3 + target libraries, just 8 *.o files are affected by
the patch, for x86_64-linux approx. 8% of *.o files are affected
by the patch, though haven't checked in detail how much it affects each
file, so far only looked at x86_64-linux builtins.o, where the patch
pessimized the code in a single spot:
- movq (%rax,%rdx), %rdi
+ addq %rax, %rdx
+ movq (%rdx), %rdi
call _ZL12validate_argPK9tree_node9tree_code
which comes from insn 112 not being combined with 114, because both si
and di are on x86_64 likely spilled non-fixed hard registers and thus we
disallowed LOG_LINKS from before insn 113 into the 113..114 range, because
of the fear that the combined pattern might during reload need the hard
register that is live across it:
(insn 112 111 113 17 (parallel [
(set (reg/f:DI 134)
(plus:DI (reg:DI 132)
(reg/v:DI 113 [ off ])))
(clobber (reg:CC 17 flags))
]) ../../gcc/gimple.h:2066 266 {*adddi_1}
(expr_list:REG_DEAD (reg:DI 132)
(expr_list:REG_DEAD (reg/v:DI 113 [ off ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))))
(insn 113 112 114 17 (set (reg:SI 4 si)
(reg:SI 94 [ D.97472 ])) ../../gcc/builtins.c:11417 90 {*movsi_internal}
(expr_list:REG_DEAD (reg:SI 94 [ D.97472 ])
(nil)))
(insn 114 113 115 17 (set (reg:DI 5 di)
(mem/f:DI (reg/f:DI 134) [8 *_43+0 S8 A64])) ../../gcc/builtins.c:11417 89 {*movdi_internal}
(expr_list:REG_DEAD (reg/f:DI 134)
(nil)))
(call_insn/c/i 115 114 116 17 (set (reg:QI 0 ax)
(call (mem:QI (symbol_ref:DI ("_ZL12validate_argPK9tree_node9tree_code") [flags 0x3] <function_decl 0x7f4045caf200 validate_arg>) [0 validate_arg S1 A8])
(const_int 0 [0]))) ../../gcc/builtins.c:11417 680 {*call_value}
(expr_list:REG_DEAD (reg:DI 5 di)
(expr_list:REG_DEAD (reg:SI 4 si)
(expr_list:REG_EH_REGION (const_int 0 [0])
(nil))))
(expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di))
(expr_list:REG_BR_PRED (use (reg:SI 4 si))
(nil))))
Thoughts on this? I can surely investigate more tomorrow, look at more
random object files that differ and see how many changes it creates.
Unfortunately, because the patch works at LOG_LINKS creation time, it is
hard to gather statistics during compilation, because while lots of
LOG_LINKS will be affected, only much smaller actual combinations will not
be successfully performed because of the missing LOG_LINKS.
Typically the call argument load sequence contains just (set (reg:.. hard reg) (reg:.. pseudo))
insns and we wouldn't combine into that anyway.
The patch removes the likely_spilled_retval stuff Joern wrote a few years
ago because I believe this change should obsolete that. But, have tried to
reproduce the ICE using sh-elf target and haven't managed to do so, 4.9 is
quite a different compiler from 2005-ish gcc.
2014-01-15 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/59477
* combine.c (create_log_links): For sequences of insns loading
hard register parameters for a call starting with load of some
non-fixed likely spilled hard register don't record LOG_LINKS
from before the first load to any insn in between the first
load and the call insn. For setters of non-fixed likely
spilled hard registers other than in calls and the above sequences
disallow recoding of any LOG_LINKS crossing such insns.
(struct likely_spilled_retval_info): Remove.
(likely_spilled_retval_1, likely_spilled_retval): Remove.
(try_combine): Don't call likely_spilled_retval.
* g++.dg/torture/pr59477.C: New test.
--- gcc/combine.c.jj 2014-01-15 08:11:22.636430002 +0100
+++ gcc/combine.c 2014-01-15 15:45:01.588819713 +0100
@@ -985,20 +985,54 @@ create_log_links (void)
basic_block bb;
rtx *next_use, insn;
df_ref *def_vec, *use_vec;
+ int *undo_list;
next_use = XCNEWVEC (rtx, max_reg_num ());
+ undo_list = XCNEWVEC (int, max_reg_num ());
/* Pass through each block from the end, recording the uses of each
register and establishing log links when def is encountered.
Note that we do not clear next_use array in order to save time,
so we have to test whether the use is in the same basic block as def.
+ Avoid creating log links across instructions that set hard registers
+ that are likely to be spilled, otherwise reload might die if some
+ instruction that might need some particular hard register is combined
+ into the area where that hard register is already live. Usually
+ likely spilled non-fixed hard registers should occur in the IL
+ for function call parameter passing and call return value, return
+ value of the whole function and then for explicit register variables.
+
+ For calls expansion in calls.c usually emits instructions setting the
+ hard registers immediately before the call, with simple rhs. For this
+ case the code below attempts to identify the block of insns between the
+ first insn loading parameter into a hard register where the hard register
+ is likely spilled and last insn loading parameter into (any) hard
+ register (call_block_start and call_block_end). We allow adding log links
+ to instructions in that range only from instructions in that range, when
+ in the backward walk we reach the call_block_start insn, we remove all
+ next_use elements for insns in that block, using undo_list array
+ and undo_list_head. That way it is e.g. possible to have log links
+ from before a call argument setup sequence to insns after the call.
+ In the call_block_start .. call_block_end range we don't set
+ likely_spilled_setter_luid though.
+
+ For all other setters of likely spilled non-fixed hard registers, we just
+ disallow any log links across those insns, by setting
+ likely_spilled_setter_luid and not creating any log links pointing to
+ insns with higher luid than that.
+
There are a few cases below when we do not consider the definition or
usage -- these are taken from original flow.c did. Don't ask me why it is
done this way; I don't know and if it works, I don't want to know. */
FOR_EACH_BB_FN (bb, cfun)
{
+ int likely_spilled_setter_luid = -1;
+ int call_block_start = -1;
+ int call_block_end = -1;
+ int undo_list_head = 0;
+
FOR_BB_INSNS_REVERSE (bb, insn)
{
if (!NONDEBUG_INSN_P (insn))
@@ -1007,12 +1041,86 @@ create_log_links (void)
/* Log links are created only once. */
gcc_assert (!LOG_LINKS (insn));
+ int luid = DF_INSN_LUID (insn);
+
+ if (CALL_P (insn))
+ {
+ rtx last_reg = NULL_RTX;
+ gcc_assert (call_block_start == -1 && call_block_end == -1);
+ for (rtx link = CALL_INSN_FUNCTION_USAGE (insn);
+ link != NULL_RTX;
+ link = XEXP (link, 1))
+ if (GET_CODE (XEXP (link, 0)) == USE
+ && REG_P (XEXP (XEXP (link, 0), 0))
+ && HARD_REGISTER_P (XEXP (XEXP (link, 0), 0)))
+ {
+ rtx reg = XEXP (XEXP (link, 0), 0);
+ int nregs = hard_regno_nregs[REGNO (reg)][GET_MODE (reg)];
+ while (nregs-- > 0)
+ if (! TEST_HARD_REG_BIT (fixed_reg_set,
+ REGNO (reg) + nregs)
+ && targetm.class_likely_spilled_p
+ (REGNO_REG_CLASS (REGNO (reg) + nregs)))
+ break;
+ if (nregs >= 0)
+ last_reg = reg;
+ }
+ if (last_reg)
+ {
+ for (rtx prev = PREV_INSN (insn);
+ prev; prev = PREV_INSN (prev))
+ {
+ if (!NONDEBUG_INSN_P (prev))
+ continue;
+ if (BLOCK_FOR_INSN (prev) != bb
+ || !NONJUMP_INSN_P (prev))
+ break;
+ rtx set = single_set (prev);
+ if (! set)
+ break;
+ rtx dest = SET_DEST (set);
+ if (GET_CODE (dest) == SUBREG)
+ dest = SUBREG_REG (dest);
+ if (!REG_P (dest) || !HARD_REGISTER_P (dest))
+ break;
+ if (reg_overlap_mentioned_p (dest, last_reg))
+ {
+ if (call_block_end != -1)
+ call_block_start = DF_INSN_LUID (prev);
+ break;
+ }
+ else if (call_block_end == -1)
+ call_block_end = DF_INSN_LUID (prev);
+ }
+ if (call_block_start == -1)
+ call_block_end = -1;
+ }
+ }
+
for (def_vec = DF_INSN_DEFS (insn); *def_vec; def_vec++)
{
df_ref def = *def_vec;
int regno = DF_REF_REGNO (def);
rtx use_insn;
+ if (HARD_REGISTER_NUM_P (regno)
+ && !CALL_P (insn)
+ && !(luid >= call_block_start && luid <= call_block_end)
+ && !(DF_REF_FLAGS (def)
+ & (DF_REF_MUST_CLOBBER | DF_REF_MAY_CLOBBER)))
+ {
+ enum machine_mode mode = GET_MODE (DF_REF_REAL_REG (def));
+ int nregs = hard_regno_nregs[regno][mode];
+ while (nregs-- > 0)
+ if (! TEST_HARD_REG_BIT (fixed_reg_set, regno + nregs)
+ && targetm.class_likely_spilled_p
+ (REGNO_REG_CLASS (regno + nregs)))
+ {
+ likely_spilled_setter_luid = luid;
+ break;
+ }
+ }
+
if (!next_use[regno])
continue;
@@ -1034,7 +1142,10 @@ create_log_links (void)
continue;
use_insn = next_use[regno];
- if (BLOCK_FOR_INSN (use_insn) == bb)
+ if (BLOCK_FOR_INSN (use_insn) == bb
+ && (likely_spilled_setter_luid == -1
+ || (DF_INSN_LUID (use_insn)
+ <= likely_spilled_setter_luid)))
{
/* flow.c claimed:
@@ -1060,6 +1171,33 @@ create_log_links (void)
next_use[regno] = NULL_RTX;
}
+ if (luid == call_block_start)
+ {
+ /* At the start of the call argument block, remove
+ all next_use array values pointing to instructions
+ within the block. The undo_list array normally contains
+ all zeros, non-zero values form a chain of what needs
+ to be cleared in next_use array. Value of -1 in the array
+ means tail of the chain, other non-zero values are next
+ regno that need to be cleared plus 1 (as zero is a valid
+ register number). */
+ if (undo_list_head)
+ {
+ int r = undo_list_head;
+ do
+ {
+ int regno = r - 1;
+ r = undo_list[regno];
+ undo_list[regno] = 0;
+ next_use[regno] = NULL_RTX;
+ }
+ while (r != -1);
+ undo_list_head = 0;
+ }
+ call_block_start = -1;
+ call_block_end = -1;
+ }
+
for (use_vec = DF_INSN_USES (insn); *use_vec; use_vec++)
{
df_ref use = *use_vec;
@@ -1070,11 +1208,23 @@ create_log_links (void)
if (DF_REF_FLAGS (use) & DF_REF_CALL_STACK_USAGE)
continue;
+ if (luid >= call_block_start
+ && luid <= call_block_end
+ && undo_list[regno] == 0)
+ {
+ if (undo_list_head == 0)
+ undo_list[regno] = -1;
+ else
+ undo_list[regno] = undo_list_head;
+ undo_list_head = regno + 1;
+ }
+
next_use[regno] = insn;
}
}
}
+ free (undo_list);
free (next_use);
}
@@ -2222,87 +2372,6 @@ cant_combine_insn_p (rtx insn)
return 0;
}
-struct likely_spilled_retval_info
-{
- unsigned regno, nregs;
- unsigned mask;
-};
-
-/* Called via note_stores by likely_spilled_retval_p. Remove from info->mask
- hard registers that are known to be written to / clobbered in full. */
-static void
-likely_spilled_retval_1 (rtx x, const_rtx set, void *data)
-{
- struct likely_spilled_retval_info *const info =
- (struct likely_spilled_retval_info *) data;
- unsigned regno, nregs;
- unsigned new_mask;
-
- if (!REG_P (XEXP (set, 0)))
- return;
- regno = REGNO (x);
- if (regno >= info->regno + info->nregs)
- return;
- nregs = hard_regno_nregs[regno][GET_MODE (x)];
- if (regno + nregs <= info->regno)
- return;
- new_mask = (2U << (nregs - 1)) - 1;
- if (regno < info->regno)
- new_mask >>= info->regno - regno;
- else
- new_mask <<= regno - info->regno;
- info->mask &= ~new_mask;
-}
-
-/* Return nonzero iff part of the return value is live during INSN, and
- it is likely spilled. This can happen when more than one insn is needed
- to copy the return value, e.g. when we consider to combine into the
- second copy insn for a complex value. */
-
-static int
-likely_spilled_retval_p (rtx insn)
-{
- rtx use = BB_END (this_basic_block);
- rtx reg, p;
- unsigned regno, nregs;
- /* We assume here that no machine mode needs more than
- 32 hard registers when the value overlaps with a register
- for which TARGET_FUNCTION_VALUE_REGNO_P is true. */
- unsigned mask;
- struct likely_spilled_retval_info info;
-
- if (!NONJUMP_INSN_P (use) || GET_CODE (PATTERN (use)) != USE || insn == use)
- return 0;
- reg = XEXP (PATTERN (use), 0);
- if (!REG_P (reg) || !targetm.calls.function_value_regno_p (REGNO (reg)))
- return 0;
- regno = REGNO (reg);
- nregs = hard_regno_nregs[regno][GET_MODE (reg)];
- if (nregs == 1)
- return 0;
- mask = (2U << (nregs - 1)) - 1;
-
- /* Disregard parts of the return value that are set later. */
- info.regno = regno;
- info.nregs = nregs;
- info.mask = mask;
- for (p = PREV_INSN (use); info.mask && p != insn; p = PREV_INSN (p))
- if (INSN_P (p))
- note_stores (PATTERN (p), likely_spilled_retval_1, &info);
- mask = info.mask;
-
- /* Check if any of the (probably) live return value registers is
- likely spilled. */
- nregs --;
- do
- {
- if ((mask & 1 << nregs)
- && targetm.class_likely_spilled_p (REGNO_REG_CLASS (regno + nregs)))
- return 1;
- } while (nregs--);
- return 0;
-}
-
/* Adjust INSN after we made a change to its destination.
Changing the destination can invalidate notes that say something about
@@ -2512,8 +2581,7 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx
if (cant_combine_insn_p (i3)
|| cant_combine_insn_p (i2)
|| (i1 && cant_combine_insn_p (i1))
- || (i0 && cant_combine_insn_p (i0))
- || likely_spilled_retval_p (i3))
+ || (i0 && cant_combine_insn_p (i0)))
return 0;
combine_attempts++;
--- gcc/testsuite/g++.dg/torture/pr59477.C.jj 2014-01-15 09:38:18.348864654 +0100
+++ gcc/testsuite/g++.dg/torture/pr59477.C 2014-01-15 09:38:18.348864654 +0100
@@ -0,0 +1,25 @@
+// PR rtl-optimization/59477
+// { dg-do compile }
+
+struct A
+{
+ unsigned *a;
+ unsigned long b;
+ A (unsigned long x) : a (), b (x) {}
+};
+
+struct B
+{
+ B (int);
+ B (const B &) {}
+};
+
+B bar (B, B, A);
+int v;
+
+void
+foo ()
+{
+ B c = 0;
+ bar (c, c, A (1ULL << v));
+}
Jakub