This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH]: Improve ability of PRE to detect redundancies


Hi, 

On Thu, Nov 25, 2004 at 09:32:46PM -0500, Daniel Berlin wrote:
> The only real problem with this is that it becomes a little too smart
> around loops.
> 
> It often detects that an iv bumper (ie the i = i + 1 part) is constant
> along the initial loop edge, and thusis partially redundant, so that it
> inserts a phi of (<bumper initial value>, the bumper) in the loop,
> removing one computation of the iv.
> This confuses the scalar evolutions code currently (even though we
> haven't actually changed the value), so that it can't analyze the
> variable.
> I've reported this to sebastian, who said he will look into it.
> 

After a little investigation with the patch and testcase that Danny
has sent me, I found that PRE is now creating wrap-around variables
for which I have removed the support from mainline: these were
represented as PEELED_CHREC nodes.

In some cases it is possible to represent a wrap-around variables
using the polynomial representation: for example, when the entry value
is 0, and then for all the remaining iteration of loop_3, the
evolution function is {1, +, 1}_3, it is possible to represent this
wrap-around behavior by just saying {0, +, 1}_3, in other words it is
possible to attach the first value in continuation to the subsequent
values of the variable.

This case occurs quite frequently: during a bootstrap (stage1 +
stage2) with BOOT_CFLAGS="-g -O2 -ftree-loop-linear", I have detected
7160 transformations from this class of wrap-around to a polynomial
representation.  The behavior of the current mainline is safe, but not
optimal, by answering "don't know" for all the wrap-around variables.
So even without my patch, Danny's patch does not produces erroneous
answers, but just adds more inefficiency in the loop optimizers.

With the patch attached, the loop optimizers based on the information
provided by the scalar evolution algorithm catch more cases, and
consequently they are executed on a wider class of programs.  The
ivopts catches a case in which a part of its analyzer does not handle
correctly the bsi.  I can propose the following patch that solves the
problem, but I would ask Zdenek for a better solution:

Index: tree-ssa-loop-ivopts.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-loop-ivopts.c,v
retrieving revision 2.33
diff -d -u -p -r2.33 tree-ssa-loop-ivopts.c
--- tree-ssa-loop-ivopts.c	25 Nov 2004 22:31:09 -0000	2.33
+++ tree-ssa-loop-ivopts.c	28 Nov 2004 13:09:51 -0000
@@ -578,7 +578,7 @@ stmt_after_ip_normal_pos (struct loop *l
   return stmt == last_stmt (bb);
 }
 
-/* Returns true if STMT if after the place where the original induction
+/* Returns true if STMT is after the place where the original induction
    variable CAND is incremented.  */
 
 static bool
@@ -596,13 +596,17 @@ stmt_after_ip_original_pos (struct iv_ca
 
   /* Scan the block from the end, since the original ivs are usually
      incremented at the end of the loop body.  */
-  for (bsi = bsi_last (stmt_bb); ; bsi_prev (&bsi))
+  for (bsi = bsi_last (stmt_bb); !bsi_end_p (bsi); bsi_prev (&bsi))
     {
       if (bsi_stmt (bsi) == cand->incremented_at)
 	return false;
       if (bsi_stmt (bsi) == stmt)
 	return true;
     }
+
+  /* FIXME: What happens if there are no more stmts in the BB, and
+     none of the above exited?  */
+  return false;
 }
 
 /* Returns true if STMT if after the place where the induction variable



During the build of the libjava, PRE goes into an error.  Because PRE
is executed before the scev algorithm, I think that it is linked to
the version of the patch that Danny has sent me earlier this week, and
on which I was working (patch attached below).  I didn't tried yet the
newer version that he sent to gcc-patches.

Index: tree-ssa-pre.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-pre.c,v
retrieving revision 2.57
diff -d -u -p -r2.57 tree-ssa-pre.c
--- tree-ssa-pre.c	25 Nov 2004 22:31:09 -0000	2.57
+++ tree-ssa-pre.c	28 Nov 2004 13:09:51 -0000
@@ -41,9 +41,9 @@ Boston, MA 02111-1307, USA.  */
 #include "alloc-pool.h"
 #include "tree-pass.h"
 #include "flags.h"
-#include "splay-tree.h"
 #include "bitmap.h"
 #include "langhooks.h"
+#include "cfgloop.h"
 
 /* TODO:
    
@@ -280,6 +280,10 @@ static struct
 
   /* The number of new PHI nodes added by PRE.  */
   int phis;
+  
+  /* The number of values found constant.  */
+  int constified;
+  
 } pre_stats;
 
 
@@ -1384,7 +1388,142 @@ create_expression_by_pieces (basic_block
     }
   return name;
 }
-      
+
+/* Return the folded version of T if T, when folded, is a gimple
+   min_invariant.  Otherwise, return T. */ 
+
+static tree
+fully_constant_expression (tree t)
+{  
+  tree folded;
+  switch (TREE_CODE_CLASS (TREE_CODE (t)))
+    {
+    case tcc_comparison:
+    case tcc_binary:
+      folded = fold_binary_to_constant (TREE_CODE (t),
+					TREE_TYPE (t),
+					TREE_OPERAND (t, 0),
+					TREE_OPERAND (t, 1));
+      break;
+    case tcc_unary:
+      folded = fold_unary_to_constant (TREE_CODE (t),
+				       TREE_TYPE (t),
+				       TREE_OPERAND (t, 0));
+      break;
+    default:
+      folded = t;
+    }
+  if (folded && is_gimple_min_invariant (folded))
+    return folded;
+
+  return t;
+}
+
+/* Insert the to-be-made-available values of NODE for each predecessor, stored
+   in AVAIL, into the predecessors of BLOCK, and merge the result with a phi
+   node, given the same value handle as NODE.  The prefix of the phi node is
+   given with TMPNAME*/
+
+static bool
+insert_into_preds_of_block (basic_block block, value_set_node_t node,
+			    tree *avail, const char *tmpname)
+{
+  tree val = get_value_handle (node->expr);
+  edge pred;
+  basic_block bprime;
+  tree eprime;
+  edge_iterator ei;
+  tree type = TREE_TYPE (avail[EDGE_PRED (block, 0)->src->index]);
+  tree temp;
+  
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Found partial redundancy for expression ");
+      print_generic_expr (dump_file, node->expr, 0);
+      fprintf (dump_file, "\n");
+    }
+
+  /* Make sure we aren't creating an induction variable.  */
+  if (0 && block->loop_depth > 0 && EDGE_COUNT (block->preds) == 2)
+    {
+      bool firstinsideloop = false;
+      basic_block insideloopblock = NULL;
+      bool secondinsideloop = false;
+      basic_block outsideloopblock = NULL;
+      if (flow_bb_inside_loop_p (block->loop_father, 
+				 EDGE_PRED (block, 0)->src))
+	{
+	  firstinsideloop = true;
+	  insideloopblock = EDGE_PRED (block, 0)->src;
+	  outsideloopblock = EDGE_PRED (block, 1)->src;
+	}
+      if (flow_bb_inside_loop_p (block->loop_father,
+				 EDGE_PRED (block, 1)->src))
+	{
+	  secondinsideloop = true;
+	  insideloopblock = EDGE_PRED (block, 1)->src;
+	  outsideloopblock = EDGE_PRED (block, 0)->src;
+	}
+      /* Induction variables only have one edge inside the loop.  */
+      if (firstinsideloop ^ secondinsideloop
+	  && is_gimple_min_invariant (avail[outsideloopblock->index])
+	  && !is_gimple_min_invariant (avail[insideloopblock->index]))
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, "Skipping partial redundancy: Looks like an induction variable\n");
+	  return false;
+	}
+    }
+	  
+
+  /* Make the necessary insertions.  */
+  FOR_EACH_EDGE (pred, ei, block->preds)
+    {
+      tree stmts = alloc_stmt_list ();
+      tree builtexpr;
+      bprime = pred->src;
+      eprime = avail[bprime->index];
+      if (BINARY_CLASS_P (eprime)
+	  || UNARY_CLASS_P (eprime))
+	{
+	  builtexpr = create_expression_by_pieces (bprime,
+						   eprime,
+						   stmts);
+	  bsi_insert_on_edge_immediate (pred, stmts);
+	  avail[bprime->index] = builtexpr;
+	}			      
+    }
+  /* Now build a phi for the new variable.  */
+  temp = create_tmp_var (type, tmpname);
+  add_referenced_tmp_var (temp);
+  temp = create_phi_node (temp, block);
+ 
+  FOR_EACH_EDGE (pred, ei, block->preds)
+    {
+      add_phi_arg (temp, avail[pred->src->index],
+		   pred);
+    }
+  
+  vn_add (PHI_RESULT (temp), val, NULL);
+  
+  bitmap_value_replace_in_set (AVAIL_OUT (block), 
+			       PHI_RESULT (temp));
+  bitmap_insert_into_set (NEW_SETS (block),
+			  PHI_RESULT (temp));
+  bitmap_insert_into_set (PHI_GEN (block),
+			  PHI_RESULT (temp));  
+  
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Created phi ");
+      print_generic_expr (dump_file, temp, 0);
+      fprintf (dump_file, " in block %d\n", block->index);
+    }
+  pre_stats.phis++;
+  return true;
+}
+
+
 /* Perform insertion of partially redundant values.
    For BLOCK, do the following:
    1.  Propagate the NEW_SETS of the dominator into the current block.
@@ -1399,6 +1538,7 @@ create_expression_by_pieces (basic_block
    3. Recursively call ourselves on the dominator children of BLOCK.
 
 */
+
 static bool
 insert_aux (basic_block block)
 {
@@ -1438,7 +1578,7 @@ insert_aux (basic_block block)
 		      tree first_s = NULL;
 		      edge pred;
 		      basic_block bprime;
-		      tree eprime;
+		      tree eprime = (tree)0xafafafaf;
 		      edge_iterator ei;
 
 		      val = get_value_handle (node->expr);
@@ -1484,7 +1624,7 @@ insert_aux (basic_block block)
 			      cant_insert = true;
 			      break;
 			    }
-
+			  eprime = fully_constant_expression (eprime);
 			  vprime = get_value_handle (eprime);
 			  gcc_assert (vprime);
 			  edoubleprime = bitmap_find_leader (AVAIL_OUT (bprime),
@@ -1497,84 +1637,45 @@ insert_aux (basic_block block)
 			  else
 			    {
 			      avail[bprime->index] = edoubleprime;
-			      by_some = true; 
-			      if (first_s == NULL)
-				first_s = edoubleprime;
-			      else if (first_s != edoubleprime)
-				all_same = false;
-			      gcc_assert (first_s == edoubleprime 
-					  || !operand_equal_p
-					      (first_s, edoubleprime, 0));
-			    }
+ 			      by_some = true; 
+ 			      if (first_s == NULL)
+ 				first_s = edoubleprime;
+			      else if (!operand_equal_p (first_s, edoubleprime, 0))
+ 				all_same = false;
+ 			    }
+ 			}
+		      
+ 		      /* If we can insert it, it's not the same value
+ 			 already existing along every predecessor, and
+ 			 it's defined by some predecessor, it is
+ 			 partially redundant.  */
+ 		      if (!cant_insert && !all_same && by_some)
+ 			{
+			  if (insert_into_preds_of_block (block, node, avail, 
+							  "prephitmp"))
+			    new_stuff = true;
 			}
-		      /* If we can insert it, it's not the same value
-			 already existing along every predecessor, and
-			 it's defined by some predecessor, it is
-			 partially redundant.  */
-		      if (!cant_insert && !all_same && by_some)
+		      
+		      /* If all edges produce the same value and that value is
+			 an invariant, then the PHI has the same value on all
+			 edges.  Note this.  */
+		      else if (all_same && is_gimple_min_invariant (eprime)
+			       && !is_gimple_min_invariant (val))
 			{
-			  tree type = TREE_TYPE (avail[EDGE_PRED (block, 0)->src->index]);
-			  tree temp;
-			  if (dump_file && (dump_flags & TDF_DETAILS))
-			    {
-			      fprintf (dump_file, "Found partial redundancy for expression ");
-			      print_generic_expr (dump_file, node->expr, 0);
-			      fprintf (dump_file, "\n");
-			    }
-
-			  /* Make the necessary insertions.  */
-			  FOR_EACH_EDGE (pred, ei, block->preds)
-			    {
-			      tree stmts = alloc_stmt_list ();
-			      tree builtexpr;
-			      bprime = pred->src;
-			      eprime = avail[bprime->index];
-			      if (BINARY_CLASS_P (eprime)
-				  || UNARY_CLASS_P (eprime))
-				{
-				  builtexpr = create_expression_by_pieces (bprime,
-									   eprime,
-									   stmts);
-				  bsi_insert_on_edge (pred, stmts);
-				  avail[bprime->index] = builtexpr;
-				}			      
-			    }
-			  /* Now build a phi for the new variable.  */
-			  temp = create_tmp_var (type, "prephitmp");
-			  add_referenced_tmp_var (temp);
-			  temp = create_phi_node (temp, block);
-			  vn_add (PHI_RESULT (temp), val, NULL);
-
-#if 0
-			  if (!set_contains_value (AVAIL_OUT (block), val))
-			    insert_into_set (AVAIL_OUT (block), 
-					     PHI_RESULT (temp));
-			  else
-#endif
-			    bitmap_value_replace_in_set (AVAIL_OUT (block), 
-							 PHI_RESULT (temp));
-			  FOR_EACH_EDGE (pred, ei, block->preds)
-			    {
-			      add_phi_arg (temp, avail[pred->src->index],
-					   pred);
-			    }
-			  if (dump_file && (dump_flags & TDF_DETAILS))
-			    {
-			      fprintf (dump_file, "Created phi ");
-			      print_generic_expr (dump_file, temp, 0);
-			      fprintf (dump_file, " in block %d\n", block->index);
-			    }
-			  pre_stats.phis++;
-			  new_stuff = true;
-			  bitmap_insert_into_set (NEW_SETS (block),
-						  PHI_RESULT (temp));
-			  bitmap_insert_into_set (PHI_GEN (block),
-						  PHI_RESULT (temp));
-			}
-
-		      free (avail);
-		    }
-		}
+			  value_set_t exprset = VALUE_HANDLE_EXPR_SET (val);
+			  value_set_node_t node;
+			  for (node = exprset->head; node; node = node->next)
+ 			    {
+			      if (TREE_CODE (node->expr) == SSA_NAME)
+				{				  
+				  vn_add (node->expr, eprime, NULL);
+				  pre_stats.constified++;
+				}
+ 			    }
+ 			}
+ 		      free (avail);
+ 		    }
+ 		}
 	    }
 	}
     }
@@ -1606,6 +1707,7 @@ insert (void)
       new_stuff = false;
       new_stuff = insert_aux (ENTRY_BLOCK_PTR);
     }
+
   if (num_iterations > 2 && dump_file && (dump_flags & TDF_STATS))
     fprintf (dump_file, "insert required %d iterations\n", num_iterations);
 }
@@ -1899,12 +2001,15 @@ eliminate (void)
 /* Initialize data structures used by PRE.  */
 
 static void
-init_pre (void)
+init_pre (bool do_fre)
 {
   basic_block bb;
 
   connect_infinite_loops_to_exit ();
   vn_init ();
+  if (!do_fre)
+    current_loops = loop_optimizer_init (dump_file);
+
   memset (&pre_stats, 0, sizeof (pre_stats));
 
   /* If block 0 has more than one predecessor, it means that its PHI
@@ -1923,7 +2028,7 @@ init_pre (void)
 
   bitmap_obstack_initialize (&grand_bitmap_obstack);
   phi_translate_table = htab_create (511, expr_pred_trans_hash,
-				     expr_pred_trans_eq, free);
+					      expr_pred_trans_eq, free);
   value_set_pool = create_alloc_pool ("Value sets",
 				      sizeof (struct value_set), 30);
   bitmap_set_pool = create_alloc_pool ("Bitmap sets",
@@ -1953,7 +2058,7 @@ init_pre (void)
 /* Deallocate data structures used by PRE.  */
 
 static void
-fini_pre (void)
+fini_pre (bool do_fre)
 {
   basic_block bb;
   unsigned int i;
@@ -2000,6 +2105,11 @@ fini_pre (void)
 	  && TREE_CODE (SSA_NAME_VALUE (name)) == VALUE_HANDLE)
 	SSA_NAME_VALUE (name) = NULL;
     }
+  if (!do_fre && current_loops)
+    {
+      loop_optimizer_finalize (current_loops, dump_file);
+      current_loops = NULL;
+    }
 }
 
 
@@ -2009,7 +2119,7 @@ fini_pre (void)
 static void
 execute_pre (bool do_fre)
 {
-  init_pre ();
+  init_pre (do_fre);
 
   /* Collect and value number expressions computed in each basic
      block.  */
@@ -2043,14 +2153,16 @@ execute_pre (bool do_fre)
   /* Remove all the redundant expressions.  */
   eliminate ();
   
+  fini_pre (do_fre);
+
   if (dump_file && (dump_flags & TDF_STATS))
     {
       fprintf (dump_file, "Insertions:%d\n", pre_stats.insertions);
       fprintf (dump_file, "New PHIs:%d\n", pre_stats.phis);
       fprintf (dump_file, "Eliminated:%d\n", pre_stats.eliminations);
+      fprintf (dump_file, "Constified:%d\n", pre_stats.constified);
     }
 
-  fini_pre ();
 }
 
 

Finally here is my contribution for solving the problem reported by
Danny.  This patch translates a part of the class of wrap-around
variables into their polynomial form.

Sebastian

2004-11-28  Sebastian Pop  <pop@cri.ensmp.fr>

	* tree-scalar-evolution.c (already_unified, unify_peeled_chrec): New.
	(analyze_evolution_in_loop): Use unify_peeled_chrec.
	(scev_initialize, scev_finalize): Initialize and finalize 
	already_unified bitmap.

*** tree-scalar-evolution.c.~2.13.~	Tue Nov 23 22:33:18 2004
--- tree-scalar-evolution.c	Sun Nov 28 14:49:28 2004
*************** tree chrec_dont_know;
*** 284,289 ****
--- 284,290 ----
  tree chrec_known;
  
  static bitmap already_instantiated;
+ static bitmap already_unified;
  
  static htab_t scalar_evolution_info;
  
*************** follow_ssa_edge (struct loop *loop, 
*** 1457,1462 ****
--- 1458,1518 ----
  
  
  
+ /* A and B are the arguments of a loop phi node, and consequently they
+    are either scalars or symbolic names.  The function returns
+    chrec_dont_know if the operation fails, otherwise returns the
+    unification of the scalar value A to the beginning of the scalar
+    evolution B.  For example, it is possible to unify 0 with {1, +,
+    1}_3, into {0, +, 1}_3.  */
+ 
+ static tree
+ unify_peeled_chrec (tree a, tree b, struct loop *loop)
+ {
+   tree chrec, extrapolate, difference, type;
+ 
+   if (a == NULL_TREE 
+       || b == NULL_TREE || TREE_CODE (b) != SSA_NAME
+       || bitmap_bit_p (already_unified, SSA_NAME_VERSION (b)))
+     return chrec_dont_know;
+ 
+   if (TREE_CODE (a) == SSA_NAME)
+     {
+       tree chrec_a;
+ 
+       if (bitmap_bit_p (already_unified, SSA_NAME_VERSION (a)))
+ 	return chrec_dont_know;
+ 
+       bitmap_set_bit (already_unified, SSA_NAME_VERSION (a));
+       chrec_a = instantiate_parameters (loop, analyze_scalar_evolution (loop, a));
+       bitmap_clear_bit (already_unified, SSA_NAME_VERSION (a));
+       a = chrec_a;
+     }
+   
+   if (TREE_CODE (a) != INTEGER_CST)
+     /* FIXME: The multivariate case is not handled yet.  */
+     return chrec_dont_know;
+ 
+   bitmap_set_bit (already_unified, SSA_NAME_VERSION (b));
+   chrec = instantiate_parameters (loop, analyze_scalar_evolution (loop, b));
+   bitmap_clear_bit (already_unified, SSA_NAME_VERSION (b));
+   
+   if (chrec == NULL_TREE 
+       || TREE_CODE (chrec) != POLYNOMIAL_CHREC
+       || TREE_CODE (CHREC_LEFT (chrec)) != INTEGER_CST
+       || TREE_CODE (CHREC_RIGHT (chrec)) != INTEGER_CST)
+     return chrec_dont_know;
+ 
+   type = chrec_type (chrec);
+   extrapolate = fold (build2 (MINUS_EXPR, type, CHREC_LEFT (chrec),
+ 			      CHREC_RIGHT (chrec)));
+   difference = fold (build2 (MINUS_EXPR, integer_type_node, a, extrapolate));
+ 
+   if (integer_zerop (difference))
+     return build_polynomial_chrec (CHREC_VARIABLE (chrec), fold_convert (type, a),
+ 				   CHREC_RIGHT (chrec));
+   return chrec_dont_know;
+ }
+ 
  /* Given a LOOP_PHI_NODE, this function determines the evolution
     function from LOOP_PHI_NODE to LOOP_PHI_NODE in the loop.  */
  
*************** analyze_evolution_in_loop (tree loop_phi
*** 1499,1512 ****
        else
  	res = false;
  	      
!       /* When it is impossible to go back on the same
! 	 loop_phi_node by following the ssa edges, the
! 	 evolution is represented by a peeled chrec, i.e. the
! 	 first iteration, EV_FN has the value INIT_COND, then
! 	 all the other iterations it has the value of ARG.  
! 	 For the moment, PEELED_CHREC nodes are not built.  */
        if (!res)
! 	ev_fn = chrec_dont_know;
        
        /* When there are multiple back edges of the loop (which in fact never
  	 happens currently, but nevertheless), merge their evolutions.  */
--- 1555,1571 ----
        else
  	res = false;
  	      
!       /* When it is impossible to go back on the same loop_phi_node by
! 	 following the ssa edges, the evolution can be represented by
! 	 a peeled chrec, i.e. the first iteration, EV_FN has the value
! 	 INIT_COND, then all the other iterations it has the value of
! 	 ARG.  For the moment, PEELED_CHREC nodes are not built.  We
! 	 try to produce a simple chrec by unifying the initial value
! 	 to the rest of the evolution function.  This is possible when
! 	 for example we have to unify 0 with {1, +, 1}_3, that gives
! 	 the simple chrec {0, +, 1}_3.  */
        if (!res)
! 	ev_fn = unify_peeled_chrec (init_cond, arg, loop);
        
        /* When there are multiple back edges of the loop (which in fact never
  	 happens currently, but nevertheless), merge their evolutions.  */
*************** scev_initialize (struct loops *loops)
*** 2387,2392 ****
--- 2446,2452 ----
    scalar_evolution_info = htab_create (100, hash_scev_info,
  				       eq_scev_info, del_scev_info);
    already_instantiated = BITMAP_XMALLOC ();
+   already_unified = BITMAP_XMALLOC ();
    
    initialize_scalar_evolutions_analyzer ();
  
*************** scev_finalize (void)
*** 2482,2486 ****
--- 2542,2547 ----
  {
    htab_delete (scalar_evolution_info);
    BITMAP_XFREE (already_instantiated);
+   BITMAP_XFREE (already_unified);
  }
  


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]