This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[committed][PR rtl-optimization/87761] Limited iteration in regcprop to pick up secondary opportunities


This is final bits to address 87761.

Originally this patch tried to detect when it could delete a feeding
insn when hard register copy propagation was performed and immediately
delete the (dead) feeding insn.

For that to work we had to have accurate REG_DEAD notes in the stream.
So I twiddled things to make sure those notes were accurate.

But in doing so, the need to actually detect and internally delete a
feeding insn when copy propagation was performed isn't necessary.  When
we reprocess a block, if a particular reg->reg copy becomes dead it will
have a REG_UNUSED note attached by the DF framework.

Thus, the prior change in this space to remove simple const/copy
initializations where the destination has a REG_UNUSED note "just works"
and is sufficient to address the fix-r4000-10 regression (as well as
fix-r4000-5 which isn't mentioned in the BZ).

So the only real issue left was the iteration step.  I pondered leaving
in the code which tracked when we would likely be able to delete a copy
and use that to trigger re-processing a block.  That works, but I think
it's more complexity than we really want/need.

Other prototypes iterated on a block until nothing changed.  I didn't
really like that either.  I strongly suspected that iteration was rare
and iterating more than once was really rare, there's just not a lot of
benefit here to justify an iterate until nothing changes algorithm --
particularly since the we've got to rescan the whole block each
iteration and possibly propagate DF info as well.  It is clearly safe to
stop iterating, it just potentially leaves the code less optimized than
it could have been.

So it's of course time to do some instrumentation.  I instrumented an
x86_64 compiler to print iteration counts for every block we processed
then did a bootstrap.

Before giving you the raw data, a few notes.  An iteration count of zero
means regcprop processed the block, but found no optimization
opportunities at all.  That case is by far the most common.

An iteration count of one indicates regcprop  was able to do useful
propagations on the first pass over the block, but that no secondary
opportunities were identified.

An iteration count of two indicates the first pass propagations resulted
in insn deletions which in turn exposed additional propagations.  This
is the case we're looking for.

Counts higher than indicate that we found even more opportunities on
subsequent iterations.  These are exceedingly rare.


Iterations    Count
    0         3584456    No opts found
    1         183165     Single level propagation
    2         1364       Two level propagation
    3         43
    4         5
    5+        none

The case we're chasing is the two level propagation.  ie, the first pass
does some propagation which in turn makes some insn(s) dead and removal
of those dead insn(s) exposes new propagation opportunities.

The data make it pretty clear we want to avoid as much overhead as
possible in the common case of nothing changes.  That's over 95% of the
time.

Additionally, there's just so so little to gain with each iteration that
an "iterate until nothing changes" approach just seems like a waste.

So we cater to those components in the distribution.  ie, avoid as much
overhead as possible in the most common case, and don't bother with a
"repeat until nothing changes" kind of algorithm.  We allow a single
reprocessing of the block if something changed.

I seriously considered bubbling up a "hint" that we'd likely be able to
remove an insn and using that to help decide when to allow that single
reprocessing of the current block.  The hint would have to be a bit of
an optimistic test because of the inaccuracies in the notes.  Ultimately
I decided it was unlikely that the hint would really filter out that
much work.

This has been bootstrapped and regression tested on x86_64. aarch64,
ppc64le, ppc64 & sparc64.  It's also bootstrapped on various platforms
vi qemu.  It's also gone through a build/test cycle on mips64-linux-gnu
and mips64el-linux-gnu where it fixed fix-r4000-10.c and fix-4000-5.c.
Finally it's gone through a variety of *-elf builds, mostly to make sure
we didn't trip over anything nasty in the target specific tests like we
saw for the fpr-moves-5 fixes.

The patch looks bigger than it really is due to an indention change.
I'll be installing into on the trunk momentarily.

Phew.

On a non-technical note, I'm happy I dug into this, mostly because the
regcprop fix can help other targets and there's some easy stage1 things
which likely help in other cases that I haven't discussed much here.

However, I'm increasingly of the opinion that MIPS targets need to drop
off the priority platform list.  Given the trajectory I see for MIPS
based processors in industry, it's really hard to justify spending this
much time on them, particularly for low priority code quality issues.


diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6b9f1c4a2b6..4a5b122eedd 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,14 @@
+2019-03-26  Jeff Law  <law@redhat.com>
+
+	PR rtl-optimization/87761
+	* regcprop.c (copyprop_hardreg_forward_1): Check may_trap_p on SET,
+	not INSN.  Also check RTX_FRAME_RELATED_P.  Queue insns for DF rescan
+	as needed.
+	(pass_cprop_hardreg::execute): Add df note problem and defer insn
+	rescans.  Reprocess blocks as needed, calling df_analyze before
+	reprocessing.  Always call df_analyze before fixing up debug bind
+	insns.
+	
 2019-03-23  Segher Boessenkool  <segher@kernel.crashing.org>
 
 	* config/rs6000/xmmintrin.h (_mm_movemask_pi8): Implement for 32-bit
@@ -8,7 +19,7 @@
 	* config/aarch64/aarch64.md (zero_extendsidi2_aarch64): Fix type
 	attrribute for uxtw.
 
-2019-02-26  Jeff Law  <law@redhat.com>
+2019-03-26  Jeff Law  <law@redhat.com>
 
 	PR rtl-optimization/87761
 	* config/mips/mips-protos.h (mips_split_move): Add new argument.
diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index 926df40f954..8ca523ffe23 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -800,8 +800,9 @@ copyprop_hardreg_forward_1 (basic_block bb, struct value_data *vd)
 
       /* Detect obviously dead sets (via REG_UNUSED notes) and remove them.  */
       if (set
+	  && !may_trap_p (set)
+	  && !RTX_FRAME_RELATED_P (insn)
 	  && INSN_P (insn)
-	  && !may_trap_p (insn)
 	  && find_reg_note (insn, REG_UNUSED, SET_DEST (set))
 	  && !side_effects_p (SET_SRC (set))
 	  && !side_effects_p (SET_DEST (set)))
@@ -1034,6 +1035,7 @@ copyprop_hardreg_forward_1 (basic_block bb, struct value_data *vd)
 	     DEBUG_INSNs can be applied.  */
 	  if (vd->n_debug_insn_changes)
 	    note_uses (&PATTERN (insn), cprop_find_used_regs, vd);
+	  df_insn_rescan (insn);
 	}
 
       ksvd.vd = vd;
@@ -1112,7 +1114,10 @@ copyprop_hardreg_forward_1 (basic_block bb, struct value_data *vd)
 
 	  /* Notice copies.  */
 	  if (copy_p)
-	    copy_value (SET_DEST (set), SET_SRC (set), vd);
+	    {
+	      df_insn_rescan (insn);
+	      copy_value (SET_DEST (set), SET_SRC (set), vd);
+	    }
 	}
 
       if (insn == BB_END (bb))
@@ -1282,47 +1287,68 @@ pass_cprop_hardreg::execute (function *fun)
 {
   struct value_data *all_vd;
   basic_block bb;
-  bool analyze_called = false;
 
   all_vd = XNEWVEC (struct value_data, last_basic_block_for_fn (fun));
 
   auto_sbitmap visited (last_basic_block_for_fn (fun));
   bitmap_clear (visited);
 
+  df_note_add_problem ();
+
+  /* It is tempting to set DF_LR_RUN_DCE, but DCE may choose to delete
+     an insn and this pass would not have visibility into the removal.
+     This pass would then potentially use the source of that
+     INSN for propagation purposes, generating invalid code.
+
+     So we just ask for updated notes and handle trivial deletions
+     within this pass where we can update this passes internal
+     data structures appropriately.  */
+  df_set_flags (DF_DEFER_INSN_RESCAN);
+
   FOR_EACH_BB_FN (bb, fun)
     {
       bitmap_set_bit (visited, bb->index);
 
-      /* If a block has a single predecessor, that we've already
-	 processed, begin with the value data that was live at
-	 the end of the predecessor block.  */
-      /* ??? Ought to use more intelligent queuing of blocks.  */
-      if (single_pred_p (bb)
-	  && bitmap_bit_p (visited, single_pred (bb)->index)
-	  && ! (single_pred_edge (bb)->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)))
+      for (int pass = 0; pass < 2; pass++)
 	{
-	  all_vd[bb->index] = all_vd[single_pred (bb)->index];
-	  if (all_vd[bb->index].n_debug_insn_changes)
+	  /* If a block has a single predecessor, that we've already
+	     processed, begin with the value data that was live at
+	    the end of the predecessor block.  */
+	  /* ??? Ought to use more intelligent queuing of blocks.  */
+	  if (single_pred_p (bb)
+	      && bitmap_bit_p (visited, single_pred (bb)->index)
+	      && ! (single_pred_edge (bb)->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)))
 	    {
-	      unsigned int regno;
-
-	      for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+	      all_vd[bb->index] = all_vd[single_pred (bb)->index];
+	      if (all_vd[bb->index].n_debug_insn_changes)
 		{
-		  if (all_vd[bb->index].e[regno].debug_insn_changes)
+		  unsigned int regno;
+
+		  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
 		    {
-		      all_vd[bb->index].e[regno].debug_insn_changes = NULL;
-		      if (--all_vd[bb->index].n_debug_insn_changes == 0)
-			break;
+		      if (all_vd[bb->index].e[regno].debug_insn_changes)
+			{
+			  all_vd[bb->index].e[regno].debug_insn_changes = NULL;
+			  if (--all_vd[bb->index].n_debug_insn_changes == 0)
+			    break;
+			}
 		    }
 		}
 	    }
-	}
-      else
-	init_value_data (all_vd + bb->index);
+	  else
+	    init_value_data (all_vd + bb->index);
 
-      copyprop_hardreg_forward_1 (bb, all_vd + bb->index);
+	  /* If we were unable to propagate, then break the loop.  */
+	  if (!copyprop_hardreg_forward_1 (bb, all_vd + bb->index))
+	    break;
+	  df_analyze ();
+	}
     }
 
+  /* We must call df_analyze here unconditionally to ensure that the
+     REG_UNUSED and REG_DEAD notes are consistent with and without -g.  */
+  df_analyze ();
+
   if (MAY_HAVE_DEBUG_BIND_INSNS)
     {
       FOR_EACH_BB_FN (bb, fun)
@@ -1332,11 +1358,6 @@ pass_cprop_hardreg::execute (function *fun)
 	    unsigned int regno;
 	    bitmap live;
 
-	    if (!analyze_called)
-	      {
-		df_analyze ();
-		analyze_called = true;
-	      }
 	    live = df_get_live_out (bb);
 	    for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
 	      if (all_vd[bb->index].e[regno].debug_insn_changes)

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]