This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Don't combine across likely spilled hard reg setters (PR rtl-optimization/59477)


Hi!

As discussed in the PR, when combine combines something across a setter
of a likely spilled non-fixed hard register, we may get RA ICE, as the
combined insn might need for reload a hard reg that is live at the point
where combiner has combined the insn.

The following patch attempts to fix that by not creating LOG_LINKS in
certain cases.  There are two cases which the patch handles separately:
1) CALL_INSN and preceeding load_register_parameters sequence
   (from the call insn up to the earliest insn in the sequence that
   sets some likely spilled non-fixed hard register
2) other setters of likely spilled non-fixed hard register
   (loading of return value of a function into the return register(s),
   explicit register vars, ...)

1) is handled specially, by disallowing LOG_LINKS from before the first
   insn into the sequence, because in between that first insn and
   the CALL_INSN a likely spilled hard register is live.  LOG_LINKS from
   before the sequence to after the CALL_INSN or from within the sequence
   to after the call or from within the sequence to within the sequence
   are allowed
2) other setters are handled just like a basic block boundary for LOG_LINKS,
   no LOG_LINKS are allowed to cross that boundary

Bootstrapped/regtested on x86_64-linux and i686-linux.

For i686-linux stage3 + target libraries, just 8 *.o files are affected by
the patch, for x86_64-linux approx. 8% of *.o files are affected
by the patch, though haven't checked in detail how much it affects each
file, so far only looked at x86_64-linux builtins.o, where the patch
pessimized the code in a single spot:
-       movq    (%rax,%rdx), %rdi
+       addq    %rax, %rdx
+       movq    (%rdx), %rdi
        call    _ZL12validate_argPK9tree_node9tree_code
which comes from insn 112 not being combined with 114, because both si
and di are on x86_64 likely spilled non-fixed hard registers and thus we
disallowed LOG_LINKS from before insn 113 into the 113..114 range, because
of the fear that the combined pattern might during reload need the hard
register that is live across it:
(insn 112 111 113 17 (parallel [
            (set (reg/f:DI 134)
                (plus:DI (reg:DI 132)
                    (reg/v:DI 113 [ off ])))
            (clobber (reg:CC 17 flags))
        ]) ../../gcc/gimple.h:2066 266 {*adddi_1}
     (expr_list:REG_DEAD (reg:DI 132)
        (expr_list:REG_DEAD (reg/v:DI 113 [ off ])
            (expr_list:REG_UNUSED (reg:CC 17 flags)
                (nil)))))
(insn 113 112 114 17 (set (reg:SI 4 si)
        (reg:SI 94 [ D.97472 ])) ../../gcc/builtins.c:11417 90 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 94 [ D.97472 ])
        (nil)))
(insn 114 113 115 17 (set (reg:DI 5 di)
        (mem/f:DI (reg/f:DI 134) [8 *_43+0 S8 A64])) ../../gcc/builtins.c:11417 89 {*movdi_internal}
     (expr_list:REG_DEAD (reg/f:DI 134)
        (nil)))
(call_insn/c/i 115 114 116 17 (set (reg:QI 0 ax)
        (call (mem:QI (symbol_ref:DI ("_ZL12validate_argPK9tree_node9tree_code") [flags 0x3]  <function_decl 0x7f4045caf200 validate_arg>) [0 validate_arg S1 A8])
            (const_int 0 [0]))) ../../gcc/builtins.c:11417 680 {*call_value}
     (expr_list:REG_DEAD (reg:DI 5 di)
        (expr_list:REG_DEAD (reg:SI 4 si)
            (expr_list:REG_EH_REGION (const_int 0 [0])
                (nil))))
    (expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di))
        (expr_list:REG_BR_PRED (use (reg:SI 4 si))
            (nil))))

Thoughts on this?  I can surely investigate more tomorrow, look at more
random object files that differ and see how many changes it creates.
Unfortunately, because the patch works at LOG_LINKS creation time, it is
hard to gather statistics during compilation, because while lots of
LOG_LINKS will be affected, only much smaller actual combinations will not
be successfully performed because of the missing LOG_LINKS.
Typically the call argument load sequence contains just (set (reg:.. hard reg) (reg:.. pseudo))
insns and we wouldn't combine into that anyway.

The patch removes the likely_spilled_retval stuff Joern wrote a few years
ago because I believe this change should obsolete that.  But, have tried to
reproduce the ICE using sh-elf target and haven't managed to do so, 4.9 is
quite a different compiler from 2005-ish gcc.

2014-01-15  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/59477
	* combine.c (create_log_links): For sequences of insns loading
	hard register parameters for a call starting with load of some
	non-fixed likely spilled hard register don't record LOG_LINKS
	from before the first load to any insn in between the first
	load and the call insn.  For setters of non-fixed likely
	spilled hard registers other than in calls and the above sequences
	disallow recoding of any LOG_LINKS crossing such insns.
	(struct likely_spilled_retval_info): Remove.
	(likely_spilled_retval_1, likely_spilled_retval): Remove.
	(try_combine): Don't call likely_spilled_retval.

	* g++.dg/torture/pr59477.C: New test.

--- gcc/combine.c.jj	2014-01-15 08:11:22.636430002 +0100
+++ gcc/combine.c	2014-01-15 15:45:01.588819713 +0100
@@ -985,20 +985,54 @@ create_log_links (void)
   basic_block bb;
   rtx *next_use, insn;
   df_ref *def_vec, *use_vec;
+  int *undo_list;
 
   next_use = XCNEWVEC (rtx, max_reg_num ());
+  undo_list = XCNEWVEC (int, max_reg_num ());
 
   /* Pass through each block from the end, recording the uses of each
      register and establishing log links when def is encountered.
      Note that we do not clear next_use array in order to save time,
      so we have to test whether the use is in the same basic block as def.
 
+     Avoid creating log links across instructions that set hard registers
+     that are likely to be spilled, otherwise reload might die if some
+     instruction that might need some particular hard register is combined
+     into the area where that hard register is already live.  Usually
+     likely spilled non-fixed hard registers should occur in the IL
+     for function call parameter passing and call return value, return
+     value of the whole function and then for explicit register variables.
+
+     For calls expansion in calls.c usually emits instructions setting the
+     hard registers immediately before the call, with simple rhs.  For this
+     case the code below attempts to identify the block of insns between the
+     first insn loading parameter into a hard register where the hard register
+     is likely spilled and last insn loading parameter into (any) hard
+     register (call_block_start and call_block_end).  We allow adding log links
+     to instructions in that range only from instructions in that range, when
+     in the backward walk we reach the call_block_start insn, we remove all
+     next_use elements for insns in that block, using undo_list array
+     and undo_list_head.  That way it is e.g. possible to have log links
+     from before a call argument setup sequence to insns after the call.
+     In the call_block_start .. call_block_end range we don't set
+     likely_spilled_setter_luid though.
+
+     For all other setters of likely spilled non-fixed hard registers, we just
+     disallow any log links across those insns, by setting
+     likely_spilled_setter_luid and not creating any log links pointing to
+     insns with higher luid than that.
+
      There are a few cases below when we do not consider the definition or
      usage -- these are taken from original flow.c did. Don't ask me why it is
      done this way; I don't know and if it works, I don't want to know.  */
 
   FOR_EACH_BB_FN (bb, cfun)
     {
+      int likely_spilled_setter_luid = -1;
+      int call_block_start = -1;
+      int call_block_end = -1;
+      int undo_list_head = 0;
+
       FOR_BB_INSNS_REVERSE (bb, insn)
         {
           if (!NONDEBUG_INSN_P (insn))
@@ -1007,12 +1041,86 @@ create_log_links (void)
 	  /* Log links are created only once.  */
 	  gcc_assert (!LOG_LINKS (insn));
 
+	  int luid = DF_INSN_LUID (insn);
+
+	  if (CALL_P (insn))
+	    {
+	      rtx last_reg = NULL_RTX;
+	      gcc_assert (call_block_start == -1 && call_block_end == -1);
+	      for (rtx link = CALL_INSN_FUNCTION_USAGE (insn);
+		   link != NULL_RTX;
+		   link = XEXP (link, 1))
+		if (GET_CODE (XEXP (link, 0)) == USE
+		    && REG_P (XEXP (XEXP (link, 0), 0))
+		    && HARD_REGISTER_P (XEXP (XEXP (link, 0), 0)))
+		  {
+		    rtx reg = XEXP (XEXP (link, 0), 0);
+		    int nregs = hard_regno_nregs[REGNO (reg)][GET_MODE (reg)];
+		    while (nregs-- > 0)
+		      if (! TEST_HARD_REG_BIT (fixed_reg_set,
+					       REGNO (reg) + nregs)
+			  && targetm.class_likely_spilled_p
+					(REGNO_REG_CLASS (REGNO (reg) + nregs)))
+			break;
+		    if (nregs >= 0)
+		      last_reg = reg;
+		  }
+	      if (last_reg)
+		{
+		  for (rtx prev = PREV_INSN (insn);
+		       prev; prev = PREV_INSN (prev))
+		    {
+		      if (!NONDEBUG_INSN_P (prev))
+			continue;
+		      if (BLOCK_FOR_INSN (prev) != bb
+			  || !NONJUMP_INSN_P (prev))
+			break;
+		      rtx set = single_set (prev);
+		      if (! set)
+			break;
+		      rtx dest = SET_DEST (set);
+		      if (GET_CODE (dest) == SUBREG)
+			dest = SUBREG_REG (dest);
+		      if (!REG_P (dest) || !HARD_REGISTER_P (dest))
+			break;
+		      if (reg_overlap_mentioned_p (dest, last_reg))
+			{
+			  if (call_block_end != -1)
+			    call_block_start = DF_INSN_LUID (prev);
+			  break;
+			}
+		      else if (call_block_end == -1)
+			call_block_end = DF_INSN_LUID (prev);
+		    }
+		  if (call_block_start == -1)
+		    call_block_end = -1;
+		}
+	    }
+
           for (def_vec = DF_INSN_DEFS (insn); *def_vec; def_vec++)
             {
 	      df_ref def = *def_vec;
               int regno = DF_REF_REGNO (def);
               rtx use_insn;
 
+	      if (HARD_REGISTER_NUM_P (regno)
+		  && !CALL_P (insn)
+		  && !(luid >= call_block_start && luid <= call_block_end)
+		  && !(DF_REF_FLAGS (def)
+		       & (DF_REF_MUST_CLOBBER | DF_REF_MAY_CLOBBER)))
+		{
+		  enum machine_mode mode = GET_MODE (DF_REF_REAL_REG (def));
+		  int nregs = hard_regno_nregs[regno][mode];
+		  while (nregs-- > 0)
+		    if (! TEST_HARD_REG_BIT (fixed_reg_set, regno + nregs)
+			&& targetm.class_likely_spilled_p
+				    (REGNO_REG_CLASS (regno + nregs)))
+		      {
+			likely_spilled_setter_luid = luid;
+			break;
+		      }
+		}
+
               if (!next_use[regno])
                 continue;
 
@@ -1034,7 +1142,10 @@ create_log_links (void)
                 continue;
 
               use_insn = next_use[regno];
-              if (BLOCK_FOR_INSN (use_insn) == bb)
+	      if (BLOCK_FOR_INSN (use_insn) == bb
+		  && (likely_spilled_setter_luid == -1
+		      || (DF_INSN_LUID (use_insn)
+			  <= likely_spilled_setter_luid)))
                 {
                   /* flow.c claimed:
 
@@ -1060,6 +1171,33 @@ create_log_links (void)
               next_use[regno] = NULL_RTX;
             }
 
+	  if (luid == call_block_start)
+	    {
+	      /* At the start of the call argument block, remove
+		 all next_use array values pointing to instructions
+		 within the block.  The undo_list array normally contains
+		 all zeros, non-zero values form a chain of what needs
+		 to be cleared in next_use array.  Value of -1 in the array
+		 means tail of the chain, other non-zero values are next
+		 regno that need to be cleared plus 1 (as zero is a valid
+		 register number).  */
+	      if (undo_list_head)
+		{
+		  int r = undo_list_head;
+		  do
+		    {
+		      int regno = r - 1;
+		      r = undo_list[regno];
+		      undo_list[regno] = 0;
+		      next_use[regno] = NULL_RTX;
+		    }
+		  while (r != -1);
+		  undo_list_head = 0;
+		}
+	      call_block_start = -1;
+	      call_block_end = -1;
+	    }
+
           for (use_vec = DF_INSN_USES (insn); *use_vec; use_vec++)
             {
 	      df_ref use = *use_vec;
@@ -1070,11 +1208,23 @@ create_log_links (void)
               if (DF_REF_FLAGS (use) & DF_REF_CALL_STACK_USAGE)
                 continue;
 
+	      if (luid >= call_block_start
+		  && luid <= call_block_end
+		  && undo_list[regno] == 0)
+		{
+		  if (undo_list_head == 0)
+		    undo_list[regno] = -1;
+		  else
+		    undo_list[regno] = undo_list_head;
+		  undo_list_head = regno + 1;
+		}
+
               next_use[regno] = insn;
             }
         }
     }
 
+  free (undo_list);
   free (next_use);
 }
 
@@ -2222,87 +2372,6 @@ cant_combine_insn_p (rtx insn)
   return 0;
 }
 
-struct likely_spilled_retval_info
-{
-  unsigned regno, nregs;
-  unsigned mask;
-};
-
-/* Called via note_stores by likely_spilled_retval_p.  Remove from info->mask
-   hard registers that are known to be written to / clobbered in full.  */
-static void
-likely_spilled_retval_1 (rtx x, const_rtx set, void *data)
-{
-  struct likely_spilled_retval_info *const info =
-    (struct likely_spilled_retval_info *) data;
-  unsigned regno, nregs;
-  unsigned new_mask;
-
-  if (!REG_P (XEXP (set, 0)))
-    return;
-  regno = REGNO (x);
-  if (regno >= info->regno + info->nregs)
-    return;
-  nregs = hard_regno_nregs[regno][GET_MODE (x)];
-  if (regno + nregs <= info->regno)
-    return;
-  new_mask = (2U << (nregs - 1)) - 1;
-  if (regno < info->regno)
-    new_mask >>= info->regno - regno;
-  else
-    new_mask <<= regno - info->regno;
-  info->mask &= ~new_mask;
-}
-
-/* Return nonzero iff part of the return value is live during INSN, and
-   it is likely spilled.  This can happen when more than one insn is needed
-   to copy the return value, e.g. when we consider to combine into the
-   second copy insn for a complex value.  */
-
-static int
-likely_spilled_retval_p (rtx insn)
-{
-  rtx use = BB_END (this_basic_block);
-  rtx reg, p;
-  unsigned regno, nregs;
-  /* We assume here that no machine mode needs more than
-     32 hard registers when the value overlaps with a register
-     for which TARGET_FUNCTION_VALUE_REGNO_P is true.  */
-  unsigned mask;
-  struct likely_spilled_retval_info info;
-
-  if (!NONJUMP_INSN_P (use) || GET_CODE (PATTERN (use)) != USE || insn == use)
-    return 0;
-  reg = XEXP (PATTERN (use), 0);
-  if (!REG_P (reg) || !targetm.calls.function_value_regno_p (REGNO (reg)))
-    return 0;
-  regno = REGNO (reg);
-  nregs = hard_regno_nregs[regno][GET_MODE (reg)];
-  if (nregs == 1)
-    return 0;
-  mask = (2U << (nregs - 1)) - 1;
-
-  /* Disregard parts of the return value that are set later.  */
-  info.regno = regno;
-  info.nregs = nregs;
-  info.mask = mask;
-  for (p = PREV_INSN (use); info.mask && p != insn; p = PREV_INSN (p))
-    if (INSN_P (p))
-      note_stores (PATTERN (p), likely_spilled_retval_1, &info);
-  mask = info.mask;
-
-  /* Check if any of the (probably) live return value registers is
-     likely spilled.  */
-  nregs --;
-  do
-    {
-      if ((mask & 1 << nregs)
-	  && targetm.class_likely_spilled_p (REGNO_REG_CLASS (regno + nregs)))
-	return 1;
-    } while (nregs--);
-  return 0;
-}
-
 /* Adjust INSN after we made a change to its destination.
 
    Changing the destination can invalidate notes that say something about
@@ -2512,8 +2581,7 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx
   if (cant_combine_insn_p (i3)
       || cant_combine_insn_p (i2)
       || (i1 && cant_combine_insn_p (i1))
-      || (i0 && cant_combine_insn_p (i0))
-      || likely_spilled_retval_p (i3))
+      || (i0 && cant_combine_insn_p (i0)))
     return 0;
 
   combine_attempts++;
--- gcc/testsuite/g++.dg/torture/pr59477.C.jj	2014-01-15 09:38:18.348864654 +0100
+++ gcc/testsuite/g++.dg/torture/pr59477.C	2014-01-15 09:38:18.348864654 +0100
@@ -0,0 +1,25 @@
+// PR rtl-optimization/59477
+// { dg-do compile }
+
+struct A
+{
+  unsigned *a;
+  unsigned long b;
+  A (unsigned long x) : a (), b (x) {}
+};
+
+struct B
+{
+  B (int);
+  B (const B &) {}
+};
+
+B bar (B, B, A);
+int v;
+
+void
+foo ()
+{
+  B c = 0;
+  bar (c, c, A (1ULL << v));
+}

	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]