This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH]: Improve ability of PRE to detect redundancies


Steven Bosscher asked me to submit this patch for 4.0.

I'm not sure whether it should be in 4.0 or 4.1 (i lean towards 4.1
personally), so i'm curious to see what others think.

One long-standing problem in GVN-PRE has been that we weren't doing
constant folding at any point, so that we didn't discover that things
were equal along all edges, and we missed some redundancies because we
thought the expressions weren't available along any edge.

This patch changes this by folding during insertion so that we can
detect these redundancies.
This makes us detect and eliminate 3x more redundancies than we used to.

In addition, the optimization now subsumes partial constant propagation.

We will now transform:

int main (void)
{
  int x, c, y;
  x = 3;
  if (c)
    x = 2;
  y = x + 1;
  return y;
}

into
int main (void)
{
  int c, y;
  if (c)
     y = 3;
  else
     y = 4;
  return y;
}
(it will also transform the version where only one side is constant)

and 
int foo (int i)
{
  int a, b;
  if (i)
    a = 3, b = 2;
  else
    a = 2, b = 3;
  return a + b;
}

into

int foo (int i)
{
  return 5;
}

The only real problem with this is that it becomes a little too smart
around loops.

It often detects that an iv bumper (ie the i = i + 1 part) is constant
along the initial loop edge, and thusis partially redundant, so that it
inserts a phi of (<bumper initial value>, the bumper) in the loop,
removing one computation of the iv.
This confuses the scalar evolutions code currently (even though we
haven't actually changed the value), so that it can't analyze the
variable.
I've reported this to sebastian, who said he will look into it.

In the meantime, i've stopped PRE from creating anything that looks like
an induction variable to avoid this problem.

3 new PRE tests are included to test this new functionality.
This patch is currently SPEC and compile time neutral on ppc and x86
(which is one reason i've been leaning towards 4.1).
The loop test in it should go away once scalar evolutions has no
problems.

Bootstrapped and regtested (yesterday, when things were still
building :P) on i686-pc-linux-gnu.
Comments welcome on whether to put this in now or wait.

2004-11-25  Daniel Berlin  <dberlin@dberlin.org>

	* Makefile.in (tree-ssa-pre.o): Add $(CFGLOOP_H) dependency.
	* tree-ssa-pre.c: Remove splay-tree.h include.  Add cfgloop.h
	include.
        (pre_stats): Add constified.
	(fully_constant_expression): New function.
	(insert_aux): Note constified expressions.
	Factor out insertion code from here.
	(insert_into_preds_of_block): To here.
	Don't create obvious induction variables.
	(init_pre): Call loop_optimizer_init if we are doing PRE.
	(fini_pre): Call loop_optimizer_finalize if we are doing PRE.

Index: tree-ssa-pre.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-pre.c,v
retrieving revision 2.56
diff -u -p -r2.56 tree-ssa-pre.c
--- tree-ssa-pre.c	25 Nov 2004 10:31:38 -0000	2.56
+++ tree-ssa-pre.c	26 Nov 2004 02:07:21 -0000
@@ -41,9 +41,9 @@ Boston, MA 02111-1307, USA.  */
 #include "alloc-pool.h"
 #include "tree-pass.h"
 #include "flags.h"
-#include "splay-tree.h"
 #include "bitmap.h"
 #include "langhooks.h"
+#include "cfgloop.h"
 
 /* TODO:
    
@@ -56,13 +56,6 @@ Boston, MA 02111-1307, USA.  */
       a new value every time we see a statement with a vuse.
    3. Strength reduction can be performed by anticipating expressions
       we can repair later on.
-   4. Our canonicalization of expressions during lookups don't take
-      constants into account very well.  In particular, we don't fold
-      anywhere, so we can get situations where we stupidly think
-      something is a new value (a + 1 + 1 vs a + 2).  This is somewhat
-      expensive to fix, but it does expose a lot more eliminations.
-      It may or not be worth it, depending on how critical you
-      consider PRE vs just plain GRE.
 */   
 
 /* For ease of terminology, "expression node" in the below refers to
@@ -280,6 +273,10 @@ static struct
 
   /* The number of new PHI nodes added by PRE.  */
   int phis;
+  
+  /* The number of values found constant.  */
+  int constified;
+  
 } pre_stats;
 
 
@@ -1384,7 +1381,140 @@ create_expression_by_pieces (basic_block
     }
   return name;
 }
-      
+
+/* Return the folded version of T if T, when folded, is a gimple
+   min_invariant.  Otherwise, return T. */ 
+
+static tree
+fully_constant_expression (tree t)
+{  
+  tree folded;
+  switch (TREE_CODE_CLASS (TREE_CODE (t)))
+    {
+    case tcc_comparison:
+    case tcc_binary:
+      folded = fold_binary_to_constant (TREE_CODE (t), TREE_TYPE (t),
+					TREE_OPERAND (t, 0),
+					TREE_OPERAND (t, 1));
+      break;
+    case tcc_unary:
+      folded = fold_unary_to_constant (TREE_CODE (t), TREE_TYPE (t),
+				       TREE_OPERAND (t, 0));
+      break;
+    default:
+      folded = t;
+    }
+  if (folded && is_gimple_min_invariant (folded))
+    return folded;
+
+  return t;
+}
+
+/* Insert the to-be-made-available values of NODE for each predecessor, stored
+   in AVAIL, into the predecessors of BLOCK, and merge the result with a phi
+   node, given the same value handle as NODE.  The prefix of the phi node is
+   given with TMPNAME*/
+
+static bool
+insert_into_preds_of_block (basic_block block, value_set_node_t node,
+			    tree *avail, const char *tmpname)
+{
+  tree val = get_value_handle (node->expr);
+  edge pred;
+  basic_block bprime;
+  tree eprime;
+  edge_iterator ei;
+  tree type = TREE_TYPE (avail[EDGE_PRED (block, 0)->src->index]);
+  tree temp;
+  
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Found partial redundancy for expression ");
+      print_generic_expr (dump_file, node->expr, 0);
+      fprintf (dump_file, "\n");
+    }
+
+  /* Make sure we aren't creating an induction variable.  */
+  if (block->loop_depth > 0 && EDGE_COUNT (block->preds) == 2)
+    {
+      bool firstinsideloop = false;
+      basic_block insideloopblock = NULL;
+      bool secondinsideloop = false;
+      basic_block outsideloopblock = NULL;
+      if (flow_bb_inside_loop_p (block->loop_father, 
+				 EDGE_PRED (block, 0)->src))
+	{
+	  firstinsideloop = true;
+	  insideloopblock = EDGE_PRED (block, 0)->src;
+	  outsideloopblock = EDGE_PRED (block, 1)->src;
+	}
+      if (flow_bb_inside_loop_p (block->loop_father,
+				 EDGE_PRED (block, 1)->src))
+	{
+	  secondinsideloop = true;
+	  insideloopblock = EDGE_PRED (block, 1)->src;
+	  outsideloopblock = EDGE_PRED (block, 0)->src;
+	}
+      /* Induction variables only have one edge inside the loop.  */
+      if (firstinsideloop ^ secondinsideloop
+	  && is_gimple_min_invariant (avail[outsideloopblock->index])
+	  && !is_gimple_min_invariant (avail[insideloopblock->index]))
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, "Skipping partial redundancy: Looks like an induction variable\n");
+	  return false;
+	}
+    }
+	  
+
+  /* Make the necessary insertions.  */
+  FOR_EACH_EDGE (pred, ei, block->preds)
+    {
+      tree stmts = alloc_stmt_list ();
+      tree builtexpr;
+      bprime = pred->src;
+      eprime = avail[bprime->index];
+      if (BINARY_CLASS_P (eprime)
+	  || UNARY_CLASS_P (eprime))
+	{
+	  builtexpr = create_expression_by_pieces (bprime,
+						   eprime,
+						   stmts);
+	  bsi_insert_on_edge_immediate (pred, stmts);
+	  avail[bprime->index] = builtexpr;
+	}			      
+    }
+  /* Now build a phi for the new variable.  */
+  temp = create_tmp_var (type, tmpname);
+  add_referenced_tmp_var (temp);
+  temp = create_phi_node (temp, block);
+ 
+  FOR_EACH_EDGE (pred, ei, block->preds)
+    {
+      add_phi_arg (&temp, avail[pred->src->index],
+		   pred);
+    }
+  
+  vn_add (PHI_RESULT (temp), val, NULL);
+  
+  bitmap_value_replace_in_set (AVAIL_OUT (block), 
+			       PHI_RESULT (temp));
+  bitmap_insert_into_set (NEW_SETS (block),
+			  PHI_RESULT (temp));
+  bitmap_insert_into_set (PHI_GEN (block),
+			  PHI_RESULT (temp));  
+  
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Created phi ");
+      print_generic_expr (dump_file, temp, 0);
+      fprintf (dump_file, " in block %d\n", block->index);
+    }
+  pre_stats.phis++;
+  return true;
+}
+
+
 /* Perform insertion of partially redundant values.
    For BLOCK, do the following:
    1.  Propagate the NEW_SETS of the dominator into the current block.
@@ -1399,6 +1529,7 @@ create_expression_by_pieces (basic_block
    3. Recursively call ourselves on the dominator children of BLOCK.
 
 */
+
 static bool
 insert_aux (basic_block block)
 {
@@ -1438,7 +1569,7 @@ insert_aux (basic_block block)
 		      tree first_s = NULL;
 		      edge pred;
 		      basic_block bprime;
-		      tree eprime;
+		      tree eprime = NULL_TREE;
 		      edge_iterator ei;
 
 		      val = get_value_handle (node->expr);
@@ -1484,7 +1615,7 @@ insert_aux (basic_block block)
 			      cant_insert = true;
 			      break;
 			    }
-
+			  eprime = fully_constant_expression (eprime);
 			  vprime = get_value_handle (eprime);
 			  gcc_assert (vprime);
 			  edoubleprime = bitmap_find_leader (AVAIL_OUT (bprime),
@@ -1500,78 +1631,39 @@ insert_aux (basic_block block)
 			      by_some = true; 
 			      if (first_s == NULL)
 				first_s = edoubleprime;
-			      else if (first_s != edoubleprime)
+			      else if (!operand_equal_p (first_s, edoubleprime, 0))
 				all_same = false;
-			      gcc_assert (first_s == edoubleprime 
-					  || !operand_equal_p
-					      (first_s, edoubleprime, 0));
 			    }
 			}
+		      
 		      /* If we can insert it, it's not the same value
 			 already existing along every predecessor, and
 			 it's defined by some predecessor, it is
 			 partially redundant.  */
 		      if (!cant_insert && !all_same && by_some)
 			{
-			  tree type = TREE_TYPE (avail[EDGE_PRED (block, 0)->src->index]);
-			  tree temp;
-			  if (dump_file && (dump_flags & TDF_DETAILS))
-			    {
-			      fprintf (dump_file, "Found partial redundancy for expression ");
-			      print_generic_expr (dump_file, node->expr, 0);
-			      fprintf (dump_file, "\n");
-			    }
-
-			  /* Make the necessary insertions.  */
-			  FOR_EACH_EDGE (pred, ei, block->preds)
-			    {
-			      tree stmts = alloc_stmt_list ();
-			      tree builtexpr;
-			      bprime = pred->src;
-			      eprime = avail[bprime->index];
-			      if (BINARY_CLASS_P (eprime)
-				  || UNARY_CLASS_P (eprime))
-				{
-				  builtexpr = create_expression_by_pieces (bprime,
-									   eprime,
-									   stmts);
-				  bsi_insert_on_edge (pred, stmts);
-				  avail[bprime->index] = builtexpr;
-				}			      
-			    }
-			  /* Now build a phi for the new variable.  */
-			  temp = create_tmp_var (type, "prephitmp");
-			  add_referenced_tmp_var (temp);
-			  temp = create_phi_node (temp, block);
-			  vn_add (PHI_RESULT (temp), val, NULL);
-
-#if 0
-			  if (!set_contains_value (AVAIL_OUT (block), val))
-			    insert_into_set (AVAIL_OUT (block), 
-					     PHI_RESULT (temp));
-			  else
-#endif
-			    bitmap_value_replace_in_set (AVAIL_OUT (block), 
-							 PHI_RESULT (temp));
-			  FOR_EACH_EDGE (pred, ei, block->preds)
-			    {
-			      add_phi_arg (&temp, avail[pred->src->index],
-					   pred);
-			    }
-			  if (dump_file && (dump_flags & TDF_DETAILS))
+			  if (insert_into_preds_of_block (block, node, avail, 
+							  "prephitmp"))
+			    new_stuff = true;
+			}
+		      
+		      /* If all edges produce the same value and that value is
+			 an invariant, then the PHI has the same value on all
+			 edges.  Note this.  */
+		      else if (all_same && is_gimple_min_invariant (eprime)
+			       && !is_gimple_min_invariant (val))
+			{
+			  value_set_t exprset = VALUE_HANDLE_EXPR_SET (val);
+			  value_set_node_t node;
+			  for (node = exprset->head; node; node = node->next)
 			    {
-			      fprintf (dump_file, "Created phi ");
-			      print_generic_expr (dump_file, temp, 0);
-			      fprintf (dump_file, " in block %d\n", block->index);
+			      if (TREE_CODE (node->expr) == SSA_NAME)
+				{				  
+				  vn_add (node->expr, eprime, NULL);
+				  pre_stats.constified++;
+				}
 			    }
-			  pre_stats.phis++;
-			  new_stuff = true;
-			  bitmap_insert_into_set (NEW_SETS (block),
-						  PHI_RESULT (temp));
-			  bitmap_insert_into_set (PHI_GEN (block),
-						  PHI_RESULT (temp));
 			}
-
 		      free (avail);
 		    }
 		}
@@ -1606,6 +1698,7 @@ insert (void)
       new_stuff = false;
       new_stuff = insert_aux (ENTRY_BLOCK_PTR);
     }
+
   if (num_iterations > 2 && dump_file && (dump_flags & TDF_STATS))
     fprintf (dump_file, "insert required %d iterations\n", num_iterations);
 }
@@ -1899,12 +1992,15 @@ eliminate (void)
 /* Initialize data structures used by PRE.  */
 
 static void
-init_pre (void)
+init_pre (bool do_fre)
 {
   basic_block bb;
 
   connect_infinite_loops_to_exit ();
   vn_init ();
+  if (!do_fre)
+    current_loops = loop_optimizer_init (dump_file);
+
   memset (&pre_stats, 0, sizeof (pre_stats));
 
   /* If block 0 has more than one predecessor, it means that its PHI
@@ -1953,7 +2049,7 @@ init_pre (void)
 /* Deallocate data structures used by PRE.  */
 
 static void
-fini_pre (void)
+fini_pre (bool do_fre)
 {
   basic_block bb;
   unsigned int i;
@@ -2000,6 +2096,11 @@ fini_pre (void)
 	  && TREE_CODE (SSA_NAME_VALUE (name)) == VALUE_HANDLE)
 	SSA_NAME_VALUE (name) = NULL;
     }
+  if (!do_fre && current_loops)
+    {
+      loop_optimizer_finalize (current_loops, dump_file);
+      current_loops = NULL;
+    }
 }
 
 
@@ -2009,7 +2110,7 @@ fini_pre (void)
 static void
 execute_pre (bool do_fre)
 {
-  init_pre ();
+  init_pre (do_fre);
 
   /* Collect and value number expressions computed in each basic
      block.  */
@@ -2042,15 +2143,16 @@ execute_pre (bool do_fre)
 
   /* Remove all the redundant expressions.  */
   eliminate ();
-  
+
   if (dump_file && (dump_flags & TDF_STATS))
     {
       fprintf (dump_file, "Insertions:%d\n", pre_stats.insertions);
       fprintf (dump_file, "New PHIs:%d\n", pre_stats.phis);
       fprintf (dump_file, "Eliminated:%d\n", pre_stats.eliminations);
+      fprintf (dump_file, "Constified:%d\n", pre_stats.constified);
     }
-
-  fini_pre ();
+  
+  fini_pre (do_fre);
 }
 
 
Index: testsuite/gcc.dg/tree-ssa/ssa-pre-3.c
===================================================================
RCS file: testsuite/gcc.dg/tree-ssa/ssa-pre-3.c
diff -N testsuite/gcc.dg/tree-ssa/ssa-pre-3.c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ testsuite/gcc.dg/tree-ssa/ssa-pre-3.c	26 Nov 2004 02:07:23 -0000
@@ -0,0 +1,14 @@
+/* { dg-do compile } */ 
+/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+int main(void)
+{
+	int x, c, y;
+	x = 3;
+	if (c)
+		x = 2;
+	y = x + 1;
+	return y;
+}
+/* We should eliminate the x+1 computation from this routine, replacing
+   it with a phi of 3, 4 */
+/* { dg-final { scan-tree-dump-times "Eliminated:1" 1 "pre"} } */
Index: testsuite/gcc.dg/tree-ssa/ssa-pre-4.c
===================================================================
RCS file: testsuite/gcc.dg/tree-ssa/ssa-pre-4.c
diff -N testsuite/gcc.dg/tree-ssa/ssa-pre-4.c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ testsuite/gcc.dg/tree-ssa/ssa-pre-4.c	26 Nov 2004 02:07:23 -0000
@@ -0,0 +1,15 @@
+/* { dg-do compile } */ 
+/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+int 
+foo (int i)
+{
+	int a, b;
+	if (i)
+		a = 3, b = 2;
+	else
+		a = 2, b = 3;
+	return a + b;
+}
+/* We should detect that a+b is the same along both edges, and replace it with
+   5  */
+/* { dg-final { scan-tree-dump-times "Constified:1" 1 "pre"} } */
Index: testsuite/gcc.dg/tree-ssa/ssa-pre-5.c
===================================================================
RCS file: testsuite/gcc.dg/tree-ssa/ssa-pre-5.c
diff -N testsuite/gcc.dg/tree-ssa/ssa-pre-5.c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ testsuite/gcc.dg/tree-ssa/ssa-pre-5.c	26 Nov 2004 02:07:23 -0000
@@ -0,0 +1,13 @@
+/* { dg-do compile } */ 
+/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+int main(int x)
+{
+	int c, y;
+	if (c)
+		x = 2;
+	y = x + 1;
+	return y;
+}
+/* We should eliminate one evaluation of x + 1 along the x = 2 path,
+   causing one elimination.  */
+/* { dg-final { scan-tree-dump-times "Eliminated:1" 1 "pre"} } */

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]