[PATCH] [PR tree-optimization/69196] [PR tree-optimization/68398] Reorganize profitibility testing for FSM jump threading

Jeff Law law@redhat.com
Mon Jan 25 19:20:00 GMT 2016


This is the first of a few patches to address 69196 (code size 
regression with jump threading) and 68398 (coremark regression due to 
FSM changes).

While these are caused by distinct issues, they hit the same hunk of 
code.  I could address them independently, but I believe in the end 
it'll just make the whole process more difficult.

This hunk of work is a bit of reorganization.  Right now 
valid_jump_thread_path searches the path for particular nuggets of 
information (did we thread through a multiway branch, did we thread a 
multiway branch, did we thread through the loop latch, did we thread the 
loop latch, etc).

We really want that code in the FSM detection side so that we can make 
better decisions about whether or not a particular FSM path is likely to 
be profitable to thread.  So this patch moves that code into the 
detection side.

Second, the limiters for the FSM code had some minor bugs.  For example, 
it didn't count one of the blocks when determining how many blocks where 
in the FSM path.  It didn't account for PHIs when determining how many 
statements we'd copy, etc.  Before the limiters are re-tuned, the basic 
accounting needs to be more accurate.

This turns out to have a tiny positive impact of 69196, but the primary 
purpose of this patch is putting bits in the right place and fixing dumb 
accounting errors.  The real work to address 69196 and 68398 will come 
in follow-up patches.

The testsuite changes are totally an artifact of changing how we detect 
the actions of the jump threader.

Bootstrapped & regression tested on x86_64.  Installed on the trunk.

Now to actually fix the regressions :-)

Jeff
-------------- next part --------------
commit e25c808d9975556443d1bf90f968f0fd567a5de6
Author: law <law@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon Jan 25 19:19:09 2016 +0000

    	PR tree-optimization/69196
    	PR tree-optimization/68398
    	* tree-ssa-threadupdate.h (enum bb_dom_status): Moved here from
    	tree-ssa-threadupdate.c.
    	(determine_bb_domination_status): Prototype
    	* tree-ssa-threadupdate.c (enum bb_dom_status): Remove
    	(determine_bb_domination_status): No longer static.
    	(valid_jump_thread_path): Remove code to detect characteristics
    	of the jump thread path not associated with correctness.
    	* tree-ssa-threadbackward.c (fsm_find_control_statment_thread_paths):
    	Correct test for thread path length.  Count PHIs for real operands as
    	statements that need to be copied.  Do not count ASSERT_EXPRs.
    	Look at all the blocks in the thread path.  Compute and selectively
    	filter thread paths based on threading through the latch, threading
    	a multiway branch or crossing a multiway branch.
    
    	PR tree-optimization/69196
    	PR tree-optimization/68398
    	* gcc.dg/tree-ssa/pr66752-3.c: Update expected output
    	* gcc.dg/tree-ssa/pr68198.c: Likewise.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@232802 138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6d51578..d9d59d7 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,21 @@
+2016-01-25  Jeff Law  <law@redhat.com>
+
+	PR tree-optimization/69196
+	PR tree-optimization/68398
+	* tree-ssa-threadupdate.h (enum bb_dom_status): Moved here from
+	tree-ssa-threadupdate.c.
+	(determine_bb_domination_status): Prototype
+	* tree-ssa-threadupdate.c (enum bb_dom_status): Remove
+	(determine_bb_domination_status): No longer static.
+	(valid_jump_thread_path): Remove code to detect characteristics
+	of the jump thread path not associated with correctness.
+	* tree-ssa-threadbackward.c (fsm_find_control_statment_thread_paths):
+	Correct test for thread path length.  Count PHIs for real operands as
+	statements that need to be copied.  Do not count ASSERT_EXPRs.
+	Look at all the blocks in the thread path.  Compute and selectively
+	filter thread paths based on threading through the latch, threading
+	a multiway branch or crossing a multiway branch.
+
 2016-01-25  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
 
 	* config/rs6000/rs6000.c (rs6000_keep_leaf_when_profiled):  Add
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 763ceac..7e5daa9 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,10 @@
+2016-01-25  Jeff Law  <law@redhat.com>
+
+	PR tree-optimization/69196
+	PR tree-optimization/68398
+	* gcc.dg/tree-ssa/pr66752-3.c: Update expected output
+	* gcc.dg/tree-ssa/pr68198.c: Likewise.
+
 2016-01-25  David Edelsohn  <dje.gcc@gmail.com>
 
 	PR target/69469
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
index 1f27b1a..6eeaca5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
@@ -33,9 +33,9 @@ foo (int N, int c, int b, int *a)
 }
 
 /* There are 3 FSM jump threading opportunities, one of which will
-   get cancelled.  */
-/* { dg-final { scan-tree-dump-times "Registering FSM" 3 "vrp1"} } */
-/* { dg-final { scan-tree-dump-times "Cancelling FSM" 1 "vrp1"} } */
+   get filtered.  */
+/* { dg-final { scan-tree-dump-times "Registering FSM" 2 "vrp1"} } */
+/* { dg-final { scan-tree-dump-times "FSM would create irreducible loop" 1 "vrp1"} } */
 
 /* There should be no assignments or references to FLAG.  */
 /* { dg-final { scan-tree-dump-not "flag" "optimized"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr68198.c b/gcc/testsuite/gcc.dg/tree-ssa/pr68198.c
index ddd3c76..227ffeb 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr68198.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr68198.c
@@ -38,6 +38,6 @@ c_finish_omp_clauses (tree clauses)
 }
 
 /* There are 3 FSM jump threading opportunities, two of which will
-  get cancelled.  */
-/* { dg-final { scan-tree-dump-times "Registering FSM" 3 "vrp1"} } */
-/* { dg-final { scan-tree-dump-times "Cancelling FSM" 2 "vrp1"} } */
+  get filtered out.  */
+/* { dg-final { scan-tree-dump-times "Registering FSM" 1 "vrp1"} } */
+/* { dg-final { scan-tree-dump-times "FSM Thread through multiway branch without threading a multiway branch" 2 "vrp1"} } */
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 8be57a0..131630e 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -223,8 +223,10 @@ fsm_find_control_statement_thread_paths (tree name,
       if (TREE_CODE (arg) != INTEGER_CST)
 	continue;
 
+      /* Note BBI is not in the path yet, hence the +1 in the test below
+	 to make sure BBI is accounted for in the path length test.  */
       int path_length = path->length ();
-      if (path_length > PARAM_VALUE (PARAM_MAX_FSM_THREAD_LENGTH))
+      if (path_length + 1 > PARAM_VALUE (PARAM_MAX_FSM_THREAD_LENGTH))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 	    fprintf (dump_file, "FSM jump-thread path not considered: "
@@ -251,32 +253,113 @@ fsm_find_control_statement_thread_paths (tree name,
       int j;
       loop_p loop = (*path)[0]->loop_father;
       bool path_crosses_loops = false;
+      bool threaded_through_latch = false;
+      bool multiway_branch_in_path = false;
+      bool threaded_multiway_branch = false;
 
       /* Count the number of instructions on the path: as these instructions
 	 will have to be duplicated, we will not record the path if there are
 	 too many instructions on the path.  Also check that all the blocks in
 	 the path belong to a single loop.  */
-      for (j = 1; j < path_length - 1; j++)
+      for (j = 0; j < path_length; j++)
 	{
 	  basic_block bb = (*path)[j];
 
-	  if (bb->loop_father != loop)
+	  /* Remember, blocks in the path are stored in opposite order
+	     in the PATH array.  The last entry in the array reprensents
+	     the block with an outgoing edge that we will redirect to the
+	     jump threading path.  Thus we don't care about that block's
+	     loop father, nor how many statements are in that block because
+	     it will not be copied or whether or not it ends in a multiway
+	     branch.  */
+	  if (j < path_length - 1)
 	    {
-	      path_crosses_loops = true;
-	      break;
+	      if (bb->loop_father != loop)
+		{
+		  path_crosses_loops = true;
+		  break;
+		}
+
+	      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+		{
+		  gimple *stmt = gsi_stmt (gsi);
+		  /* Do not count empty statements and labels.  */
+		  if (gimple_code (stmt) != GIMPLE_NOP
+		      && gimple_code (stmt) != GIMPLE_LABEL
+		      && !(gimple_code (stmt) == GIMPLE_ASSIGN
+			   && gimple_assign_rhs_code (stmt) == ASSERT_EXPR)
+		      && !is_gimple_debug (stmt))
+		    ++n_insns;
+		}
+
+	      gphi_iterator gsip;
+	      for (gsip = gsi_start_phis (bb);
+		   !gsi_end_p (gsip);
+		   gsi_next (&gsip))
+		{
+		  gphi *phi = gsip.phi ();
+		  tree dst = gimple_phi_result (phi);
+
+		  /* We consider any non-virtual PHI as a statement since it
+		     count result in a constant assignment or copy
+		     operation.  */
+		  if (!virtual_operand_p (dst))
+		    ++n_insns;
+		}
+
+	      /* We do not look at the block with the threaded branch
+		 in this loop.  So if any block with a last statement that
+		 is a GIMPLE_SWITCH or GIMPLE_GOTO is seen, then we have a
+		 multiway branch on our path.
+
+		 The block in PATH[0] is special, it's the block were we're
+		 going to be able to eliminate its branch.  */
+	      gimple *last = last_stmt (bb);
+	      if (last && (gimple_code (last) == GIMPLE_SWITCH
+			   || gimple_code (last) == GIMPLE_GOTO))
+		{
+		  if (j == 0)
+		    threaded_multiway_branch = true;
+		  else
+		    multiway_branch_in_path = true;
+		}
 	    }
 
-	  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-	    {
-	      gimple *stmt = gsi_stmt (gsi);
-	      /* Do not count empty statements and labels.  */
-	      if (gimple_code (stmt) != GIMPLE_NOP
-		  && gimple_code (stmt) != GIMPLE_LABEL
-		  && !is_gimple_debug (stmt))
-		++n_insns;
-	    }
+	  /* Note if we thread through the latch, we will want to include
+	     the last entry in the array when determining if we thread
+	     through the loop latch.  */
+	  if (loop->latch == bb)
+	    threaded_through_latch = true;
+	}
+
+      gimple *stmt = get_gimple_control_stmt ((*path)[0]);
+      gcc_assert (stmt);
+      /* We have found a constant value for ARG.  For GIMPLE_SWITCH
+	 and GIMPLE_GOTO, we use it as-is.  However, for a GIMPLE_COND
+	 we need to substitute, fold and simplify so we can determine
+	 the edge taken out of the last block.  */
+      if (gimple_code (stmt) == GIMPLE_COND)
+	{
+	  enum tree_code cond_code = gimple_cond_code (stmt);
+
+	  /* We know the underyling format of the condition.  */
+	  arg = fold_binary (cond_code, boolean_type_node,
+			     arg, gimple_cond_rhs (stmt));
 	}
 
+      /* If this path threaded through the loop latch back into the
+	 same loop and the destination does not dominate the loop
+	 latch, then this thread would create an irreducible loop.
+
+	 We have to know the outgoing edge to figure this out.  */
+      edge taken_edge = find_taken_edge ((*path)[0], arg);
+      bool creates_irreducible_loop = false;
+      if (threaded_through_latch
+	  && loop == taken_edge->dest->loop_father
+	  && (determine_bb_domination_status (loop, taken_edge->dest)
+	      == DOMST_NONDOMINATING))
+	creates_irreducible_loop = true;
+
       if (path_crosses_loops)
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
@@ -296,6 +379,33 @@ fsm_find_control_statement_thread_paths (tree name,
 	  continue;
 	}
 
+      /* We avoid creating irreducible loops unless we thread through
+	 a multiway branch, in which case we have deemed it worth losing other
+	 loop optimizations later.  */
+      if (!threaded_multiway_branch && creates_irreducible_loop)
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file,
+		     "FSM would create irreducible loop without threading "
+		     "multiway branch.\n");
+	  path->pop ();
+	  continue;
+	}
+
+      /* When there is a multi-way branch on the path, then threading can
+	 explode the CFG due to duplicating the edges for that multi-way
+	 branch.  So like above, only allow a multi-way branch on the path
+	 if we actually thread a multi-way branch.  */
+      if (!threaded_multiway_branch && multiway_branch_in_path)
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file,
+		     "FSM Thread through multiway branch without threading "
+		     "a multiway branch.\n");
+	  path->pop ();
+	  continue;
+	}
+
       vec<jump_thread_edge *> *jump_thread_path
 	= new vec<jump_thread_edge *> ();
 
@@ -309,22 +419,7 @@ fsm_find_control_statement_thread_paths (tree name,
 	  jump_thread_path->safe_push (x);
 	}
 
-      gimple *stmt = get_gimple_control_stmt ((*path)[0]);
-      gcc_assert (stmt);
-      /* We have found a constant value for ARG.  For GIMPLE_SWITCH
-	 and GIMPLE_GOTO, we use it as-is.  However, for a GIMPLE_COND
-	 we need to substitute, fold and simplify.  */
-      if (gimple_code (stmt) == GIMPLE_COND)
-	{
-	  enum tree_code cond_code = gimple_cond_code (stmt);
-
-	  /* We know the underyling format of the condition.  */
-	  arg = fold_binary (cond_code, boolean_type_node,
-			     arg, gimple_cond_rhs (stmt));
-	}
-
       /* Add the edge taken when the control variable has value ARG.  */
-      edge taken_edge = find_taken_edge ((*path)[0], arg);
       jump_thread_edge *x
 	= new jump_thread_edge (taken_edge, EDGE_NO_COPY_SRC_BLOCK);
       jump_thread_path->safe_push (x);
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 4783c4b..620948c 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -1663,16 +1663,6 @@ dbds_continue_enumeration_p (const_basic_block bb, const void *stop)
    returns the state.  */
 
 enum bb_dom_status
-{
-  /* BB does not dominate latch of the LOOP.  */
-  DOMST_NONDOMINATING,
-  /* The LOOP is broken (there is no path from the header to its latch.  */
-  DOMST_LOOP_BROKEN,
-  /* BB dominates the latch of the LOOP.  */
-  DOMST_DOMINATING
-};
-
-static enum bb_dom_status
 determine_bb_domination_status (struct loop *loop, basic_block bb)
 {
   basic_block *bblocks;
@@ -2389,73 +2379,14 @@ static bool
 valid_jump_thread_path (vec<jump_thread_edge *> *path)
 {
   unsigned len = path->length ();
-  bool threaded_multiway_branch = false;
-  bool multiway_branch_in_path = false;
-  bool threaded_through_latch = false;
 
-  /* Check that the path is connected and see if there's a multi-way
-     branch on the path and whether or not a multi-way branch
-     is threaded.  */
+  /* Check that the path is connected.  */
   for (unsigned int j = 0; j < len - 1; j++)
     {
       edge e = (*path)[j]->e;
-      struct loop *loop = e->dest->loop_father;
-
       if (e->dest != (*path)[j+1]->e->src)
 	return false;
-
-      /* If we're threading through the loop latch back into the
-	 same loop and the destination does not dominate the loop
-	 latch, then this thread would create an irreducible loop.  */
-      if (loop->latch
-	  && loop_latch_edge (loop) == e
-	  && loop == path->last()->e->dest->loop_father
-	  && (determine_bb_domination_status (loop, path->last ()->e->dest)
-	       == DOMST_NONDOMINATING))
-	threaded_through_latch = true;
-
-      gimple *last = last_stmt (e->dest);
-      if (j == len - 2)
-	threaded_multiway_branch
-	  |= (last && gimple_code (last) == GIMPLE_SWITCH);
-      else
-	multiway_branch_in_path
-	  |= (last && gimple_code (last) == GIMPLE_SWITCH);
     }
-
-  /* If we are trying to thread through the loop latch to a block in the
-     loop that does not dominate the loop latch, then that will create an
-     irreducible loop.  We avoid that unless the jump thread has a multi-way
-     branch, in which case we have deemed it worth losing other
-     loop optimizations later if we can eliminate the multi-way branch.  */
-  if (!threaded_multiway_branch && threaded_through_latch)
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	{
-	  fprintf (dump_file,
-		   "Thread through latch without threading a multiway "
-		   "branch.\n");
-	  dump_jump_thread_path (dump_file, *path, false);
-	}
-      return false;
-    }
-
-  /* When there is a multi-way branch on the path, then threading can
-     explode the CFG due to duplicating the edges for that multi-way
-     branch.  So like above, only allow a multi-way branch on the path
-     if we actually thread a multi-way branch.  */
-  if (!threaded_multiway_branch && multiway_branch_in_path)
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	{
-	  fprintf (dump_file,
-		   "Thread through multiway branch without threading "
-		   "a multiway branch.\n");
-	  dump_jump_thread_path (dump_file, *path, false);
-	}
-      return false;
-    }
-
   return true;
 }
 
diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h
index 087f281..fbad9f6 100644
--- a/gcc/tree-ssa-threadupdate.h
+++ b/gcc/tree-ssa-threadupdate.h
@@ -47,4 +47,17 @@ extern void remove_jump_threads_including (edge);
 extern void delete_jump_thread_path (vec <class jump_thread_edge *> *);
 extern void remove_ctrl_stmt_and_useless_edges (basic_block, basic_block);
 extern void free_dom_edge_info (edge);
+
+enum bb_dom_status
+{
+  /* BB does not dominate latch of the LOOP.  */
+  DOMST_NONDOMINATING,
+  /* The LOOP is broken (there is no path from the header to its latch.  */
+  DOMST_LOOP_BROKEN,
+  /* BB dominates the latch of the LOOP.  */
+  DOMST_DOMINATING
+};
+
+enum bb_dom_status determine_bb_domination_status (struct loop *, basic_block);
+
 #endif


More information about the Gcc-patches mailing list