This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PR81165] discount killed stmts when sizing blocks for threading


On 12/11/2017 10:17 PM, Alexandre Oliva wrote:
> On Dec 11, 2017, Jeff Law <law@redhat.com> wrote:
> 
>>> +  gcc_assert (path->length () == 0 || path->last ()->e == e);
>>> +  if (path->length () == 0)
>>> +    return estimate_threading_killed_stmts (e->dest);
>>> +
>>> +  int total = 0;
>>> +  int i = 0;
>>> +  jump_thread_edge *je;
>>> +  FOR_EACH_VEC_ELT (*path, i, je)
>>> +    switch (je->type)
>>> +      {
>>> +      case EDGE_START_JUMP_THREAD:
>>> +      case EDGE_COPY_SRC_BLOCK:
>>> +      case EDGE_COPY_SRC_JOINER_BLOCK:
>>> +	total += estimate_threading_killed_stmts (je->e->dest);
>>> +	break;
>>> +
>>> +      default:
>>> +	gcc_unreachable ();
>>> +
>>> +      case EDGE_FSM_THREAD:
>>> +      case EDGE_NO_COPY_SRC_BLOCK:
>>> +	break;
>>> +      }
>>> +
>>> +  return total;
>>> +}
> 
>> So I'd mark je->e->src rather than je->e->dest.
> Ah, but then we have to mark e->dest regardless of path.  Which does
> indeed make it all look nicer.
> 
>> And you only need this
>> handling for EDGE_COPY_SRC_BLOCK.  So I don't think you need a switch,
>> just a simple je->type == EDGE_COPY_SRC_BLOCK.
>> While we do make a copy for EDGE_COPY_SRC_JOINER_BLOCK, the conditional
>> at the end of that block is not dead, so we can't really do anything
>> special with it.
> I see, thanks.
> 
> I've updated it according to richi's and your feedbacks.  Regstrapped on
> {x86_64,i686}-linux-gnu.  Ok to install?
> 
> 
> We limit the amount of copying for jump threading based on counting
> stmts.  This counting is overly pessimistic, because we will very
> often delete stmts as a consequence of jump threading: when the final
> conditional jump of a block is removed, earlier SSA names computed
> exclusively for use in that conditional are killed.  Furthermore, PHI
> nodes in blocks with only two predecessors are trivially replaced with
> their now-single values after threading.
> 
> This patch scans blocks to be copied in the path constructed so far
> and estimates the number of stmts that will be removed in the copies,
> bumping up the stmt count limit.
> 
> for  gcc/ChangeLog
> 
> 	PR tree-optimization/81165
> 	* tree-ssa-threadedge.c (uses_in_bb): New.
> 	(estimate_threading_killed_stmts): New.
> 	(estimate_threading_killed_stmts): New overload.
> 	(record_temporary_equivalences_from_stmts_at_dest): Add path
> 	parameter; adjust caller.  Expand limit when it's hit.
> 
> for  gcc/testsuite/ChangeLog
> 
> 	PR tree-optimization/81165
> 	* gcc.dg/pr81165.c: New.
Attached is what I actually committed for you (so I can wrap 36550
today).  It incorporates my most recent comments.

Namely I fixed the failure to account for the control statement being
dead, moved the new bits into tree-ssa-threadupdate.[ch] and removed the
unnecessary overload and corresponding changes in the caller.

Thanks for taking care of this!  36550 is trivial to deal with on top of
your infrastructure :-)

Jeff
commit a4b4e317d94937c9f858dd0581358cd43957fbbd
Author: Jeff Law <law@torsion.usersys.redhat.com>
Date:   Fri Dec 15 15:42:28 2017 -0500

            PR tree-optimization/81165
            * tree-ssa-threadupdate.c (uses_in_bb): New.
            (estimate_threading_killed_stmts): New.
            * tree-ssa-threadupdate.h (estimate_threading_killed_stmts): Prototype.
            * tree-ssa-threadedge.c
            (record_temporary_equivalences_from_stmts_at_dest): Expand limit
            when its hit.
    
            PR tree-optimization/81165
            * gcc.dg/pr81165.c: New.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 2d53e24b4c1..e5ab1b1a4c7 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,4 +1,14 @@
-2017-12-12  Jeff Law  <law@redhat.com>
+2017-12-15  Alexandre Oliva <aoliva@redhat.com>
+
+	PR tree-optimization/81165
+	* tree-ssa-threadupdate.c (uses_in_bb): New.
+	(estimate_threading_killed_stmts): New.
+	* tree-ssa-threadupdate.h (estimate_threading_killed_stmts): Prototype.
+	* tree-ssa-threadedge.c 
+	(record_temporary_equivalences_from_stmts_at_dest): Expand limit
+	when its hit.
+
+2017-12-15  Jeff Law  <law@redhat.com>
 
 	PR tree-optimization/83410
 	* tree-ssa-threadupdate.c (thread_block_1): Avoid certain jump
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 7bbf0e9f09e..bc005b8de26 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2017-12-15  Alexandre Oliva  <aoliva@redhat.com>
+
+	PR tree-optimization/81165
+	* gcc.dg/pr81165.c: New.
+
 2017-12-15  Jakub Jelinek  <jakub@redhat.com>
 
 	PR c++/83205
diff --git a/gcc/testsuite/gcc.dg/pr81165.c b/gcc/testsuite/gcc.dg/pr81165.c
new file mode 100644
index 00000000000..8508d893bed
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr81165.c
@@ -0,0 +1,59 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-not " \[/%\] " "optimized" } } */
+
+/* Testcase submitted for PR81165, with its main function removed as
+   it's turned into a compile test.  We want to make sure that all of
+   the divide/remainder computations are removed by tree optimizers.
+
+   We can figure out that we don't need to compute at runtime even the
+   condition to enter the loop: the initial i==0 would have to be
+   greater than the sum of two small unsigned values: 1U>>t1 is in the
+   range 0..1, whereas the char value is bounded by the range 0..127,
+   being 128 % a positive number (zero would invoke undefined
+   behavior, so we can assume it doesn't happen).  (We know it's
+   nonnegative because it's 10 times a number that has no more than
+   the bits for 16, 8 and 1 set.)
+
+   We don't realize that the loop is useless right away: jump
+   threading helps remove some of the complexity, particularly of the
+   computation within the loop: t1 is compared with 1, but it can
+   never be 1.  (We could assume as much, since its being 1 would
+   divide by zero, but we don't.)
+
+   If we don't enter the conditional block, t1 remains at 2; if we do,
+   it's set to either -1.  If we jump thread at the end of the
+   conditional block, we can figure out the ranges exclude 1 and the
+   jump body is completely optimized out.  However, we used to fail to
+   consider the block for jump threading due to the amount of
+   computation in it, without realizing most of it would die in
+   consequence of the threading.
+
+   We now take the dying code into account when deciding whether or
+   not to try jump threading.  That might enable us to optimize the
+   function into { if (x2 != 0 || (x1 & 1) == 0) abort (); }.  At the
+   time of this writing, with the patch, we get close, but the test on
+   x2 only gets as far as ((1 >> x2) == 0).  Without the patch, some
+   of the loop remains.  */
+
+short x0 = 15;
+
+void func (){
+  volatile int x1 = 1U;
+  volatile char x2 = 0;
+  char t0 = 0;
+  unsigned long t1 = 2LU;
+  int i = 0;
+  
+  if(1>>x2) {
+    t0 = -1;
+    t1 = (1&(short)(x1^8U))-1;
+  }
+
+  while(i > (int)((1U>>t1)+(char)(128%(10*(25LU&(29%x0)))))) {
+    i += (int)(12L/(1!=(int)t1));
+  }
+
+  if (t0 != -1) __builtin_abort();
+  if (t1 != 0L) __builtin_abort();
+}
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index b7781dc7d60..1fafd7b5932 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -244,7 +244,22 @@ record_temporary_equivalences_from_stmts_at_dest (edge e,
 	 expansion, then do not thread through this block.  */
       stmt_count++;
       if (stmt_count > max_stmt_count)
-	return NULL;
+	{
+	  /* If any of the stmts in the PATH's dests are going to be
+	     killed due to threading, grow the max count
+	     accordingly.  */
+	  if (max_stmt_count
+	      == PARAM_VALUE (PARAM_MAX_JUMP_THREAD_DUPLICATION_STMTS))
+	    {
+	      max_stmt_count += estimate_threading_killed_stmts (e->dest);
+	      if (dump_file)
+		fprintf (dump_file, "threading bb %i up to %i stmts\n",
+			 e->dest->index, max_stmt_count);
+	    }
+	  /* If we're still past the limit, we're done.  */
+	  if (stmt_count > max_stmt_count)
+	    return NULL;
+	}
 
       /* These are temporary ranges, do nto reflect them back into
 	 the global range data.  */
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 63ad8f9c953..b29ffe195c8 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2431,3 +2431,139 @@ register_jump_thread (vec<jump_thread_edge *> *path)
 
   paths.safe_push (path);
 }
+
+/* Return how many uses of T there are within BB, as long as there
+   aren't any uses outside BB.  If there are any uses outside BB,
+   return -1 if there's at most one use within BB, or -2 if there is
+   more than one use within BB.  */
+
+static int
+uses_in_bb (tree t, basic_block bb)
+{
+  int uses = 0;
+  bool outside_bb = false;
+
+  imm_use_iterator iter;
+  use_operand_p use_p;
+  FOR_EACH_IMM_USE_FAST (use_p, iter, t)
+    {
+      if (is_gimple_debug (USE_STMT (use_p)))
+	continue;
+
+      if (gimple_bb (USE_STMT (use_p)) != bb)
+	outside_bb = true;
+      else
+	uses++;
+
+      if (outside_bb && uses > 1)
+	return -2;
+    }
+
+  if (outside_bb)
+    return -1;
+
+  return uses;
+}
+
+/* Starting from the final control flow stmt in BB, assuming it will
+   be removed, follow uses in to-be-removed stmts back to their defs
+   and count how many defs are to become dead and be removed as
+   well.  */
+
+unsigned int
+estimate_threading_killed_stmts (basic_block bb)
+{
+  int killed_stmts = 0;
+  hash_map<tree, int> ssa_remaining_uses;
+  auto_vec<gimple *, 4> dead_worklist;
+
+  /* If the block has only two predecessors, threading will turn phi
+     dsts into either src, so count them as dead stmts.  */
+  bool drop_all_phis = EDGE_COUNT (bb->preds) == 2;
+
+  if (drop_all_phis)
+    for (gphi_iterator gsi = gsi_start_phis (bb);
+	 !gsi_end_p (gsi); gsi_next (&gsi))
+      {
+	gphi *phi = gsi.phi ();
+	tree dst = gimple_phi_result (phi);
+
+	/* We don't count virtual PHIs as stmts in
+	   record_temporary_equivalences_from_phis.  */
+	if (virtual_operand_p (dst))
+	  continue;
+
+	killed_stmts++;
+      }
+
+  if (gsi_end_p (gsi_last_bb (bb)))
+    return killed_stmts;
+
+  gimple *stmt = gsi_stmt (gsi_last_bb (bb));
+  if (gimple_code (stmt) != GIMPLE_COND
+      && gimple_code (stmt) != GIMPLE_GOTO
+      && gimple_code (stmt) != GIMPLE_SWITCH)
+    return killed_stmts;
+
+  /* The control statement is always dead.  */
+  killed_stmts++;
+  dead_worklist.quick_push (stmt);
+  while (!dead_worklist.is_empty ())
+    {
+      stmt = dead_worklist.pop ();
+
+      ssa_op_iter iter;
+      use_operand_p use_p;
+      FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
+	{
+	  tree t = USE_FROM_PTR (use_p);
+	  gimple *def = SSA_NAME_DEF_STMT (t);
+
+	  if (gimple_bb (def) == bb
+	      && (gimple_code (def) != GIMPLE_PHI
+		  || !drop_all_phis)
+	      && !gimple_has_side_effects (def))
+	    {
+	      int *usesp = ssa_remaining_uses.get (t);
+	      int uses;
+
+	      if (usesp)
+		uses = *usesp;
+	      else
+		uses = uses_in_bb (t, bb);
+
+	      gcc_assert (uses);
+
+	      /* Don't bother recording the expected use count if we
+		 won't find any further uses within BB.  */
+	      if (!usesp && (uses < -1 || uses > 1))
+		{
+		  usesp = &ssa_remaining_uses.get_or_insert (t);
+		  *usesp = uses;
+		}
+
+	      if (uses < 0)
+		continue;
+
+	      --uses;
+	      if (usesp)
+		*usesp = uses;
+
+	      if (!uses)
+		{
+		  killed_stmts++;
+		  if (usesp)
+		    ssa_remaining_uses.remove (t);
+		  if (gimple_code (def) != GIMPLE_PHI)
+		    dead_worklist.safe_push (def);
+		}
+	    }
+	}
+    }
+
+  if (dump_file)
+    fprintf (dump_file, "threading bb %i kills %i stmts\n",
+	     bb->index, killed_stmts);
+
+  return killed_stmts;
+}
diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h
index 296eff29ff4..8a3b41d82cd 100644
--- a/gcc/tree-ssa-threadupdate.h
+++ b/gcc/tree-ssa-threadupdate.h
@@ -47,6 +47,7 @@ extern void remove_jump_threads_including (edge);
 extern void delete_jump_thread_path (vec <class jump_thread_edge *> *);
 extern void remove_ctrl_stmt_and_useless_edges (basic_block, basic_block);
 extern void free_dom_edge_info (edge);
+extern unsigned int estimate_threading_killed_stmts (basic_block);
 
 enum bb_dom_status
 {

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]