This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Improve unrolled size estimates

From: Jan Hubicka <hubicka at ucw dot cz>
To: gcc-patches at gcc dot gnu dot org, rguenther at suse dot de
Date: Mon, 11 May 2009 00:43:04 +0200
Subject: Improve unrolled size estimates
Hi,
as discussed earlier, this patch tries to solve problem with unrolling
heuristic being bit too random, so actually fixing bugs in estimate_num_insns
is causing testsuite regressions all around the place, mostly because we miss
complette unrolling where we previously didn't.

The original code simply estimated unrolled size as 3/4 of nunroll *
(rolled_size - 4) where constant of 4 counts for conditional and IV
computation.

This is wrong for several reasons:
  1) last iteration of loop is reachable only till the exit condition.
     Since exit condition tends to be on the front of loop and we are speaking
     of loop rolling few times (typycaly less then 4), this is actually
     important portion.
  2) When IV renders to be constant, some exressions simplify.

This patch adds tree_estimate_loop_size that is trying to take these factors
into account.  It use dominance info to figure out BBs after exit conditional
and it uses IV info to prove constantness of expressions.  It also special case
accesses to constant initialized arrays and string constants.  With my 
const patch I run into issues here previously, in testsuite we have some testcases
that actually excercise this scenario well.

This code could be extended, for instance we can keep bitmap of constant SSA
name and propagate the knowledge as we walk the body, but I hope this will work
well enough in practice.

So new formulae is
(nunroll - 1) * (body_size - optimized_out_stmts)
+ (last_iteration_size - optimized_out_stmts_in_last_iteration)
Still 1/3 of loop is accounted for future optimization.  I broke the actual
computation out to tree_estimate_loop_size since it could be used by peeling
and other transforms too. Complette unrolling is just the most important case.

Bootstrapped/regtested i686-linux, OK?
Honza

	* tree-ssa-loop-ivcanon.c (struct loop_size): new structure.
	(constant_after_peeling): New predicate.
	(tree_estimate_loop_size): New function.
	(estimated_unrolled_size): Rewrite for new estimates.
	(try_unroll_loop_completely): Use new estimates.

	* testsuite/gcc.dg/tree-ssa/pr21829.c: Simplify matching since
	we now optimize better.
	* testsuite/gcc.dg/Wunreachable-8.c: Bogus warnings now come
	out at different places.

Index: tree-ssa-loop-ivcanon.c
===================================================================
--- tree-ssa-loop-ivcanon.c	(revision 147349)
+++ tree-ssa-loop-ivcanon.c	(working copy)
@@ -118,7 +118,7 @@ tree_num_loop_insns (struct loop *loop, 
 {
   basic_block *body = get_loop_body (loop);
   gimple_stmt_iterator gsi;
-  unsigned size = 1, i;
+  unsigned size = 0, i;
 
   for (i = 0; i < loop->num_nodes; i++)
     for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi); gsi_next (&gsi))
@@ -128,28 +128,184 @@ tree_num_loop_insns (struct loop *loop, 
   return size;
 }
 
-/* Estimate number of insns of completely unrolled loop.  We assume
-   that the size of the unrolled loop is decreased in the
-   following way (the numbers of insns are based on what
-   estimate_num_insns returns for appropriate statements):
-
-   1) exit condition gets removed (2 insns)
-   2) increment of the control variable gets removed (2 insns)
-   3) All remaining statements are likely to get simplified
-      due to constant propagation.  Hard to estimate; just
-      as a heuristics we decrease the rest by 1/3.
+/* Describe size of loop as detected by tree_estimate_loop_size.  */
+struct loop_size
+{
+  /* Number of instructions in the loop.  */
+  int overall;
+  /* Number of instructions that will be likely optimized out in
+     peeled iterations of loop  (i.e. computation based on induction
+     variable where induction variable starts at known constant.)  */
+  int eliminated_by_peeling;
+  /* Same statistics for last iteration of loop: it is smaller because
+     instructions after exit are not executed.  */
+  int last_iteration;
+  int last_iteration_eliminated_by_peeling;
+};
+
+/* Return true if OP in STMT will be constant after peeling LOOP.  */
+
+static bool
+constant_after_peeling (tree op, gimple stmt, struct loop *loop)
+{
+  affine_iv iv;
+  if (is_gimple_min_invariant (op))
+    return true;
+  
+  /* We can still fold accesses to constant arrays when index is known.  */
+  if (TREE_CODE (op) != SSA_NAME)
+    {
+      tree base = op;
+
+      /* First make fast look if we see constant array inside.  */
+      while (handled_component_p (base) || TREE_CODE (base) == ARRAY_REF)
+	base = TREE_OPERAND (base, 0);
+      if ((DECL_P (base) && TREE_CONSTANT (base) && DECL_INITIAL (base))
+	  || TREE_CODE (base) == STRING_CST)
+	{
+	  /* If so, see if we understand all the indices.  */
+	  base = op;
+	  while (handled_component_p (base)
+		 || (TREE_CODE (base) == ARRAY_REF
+		     && constant_after_peeling (TREE_OPERAND (base, 1), stmt, loop)))
+	    base = TREE_OPERAND (base, 0);
+	  return (DECL_P (base) || TREE_CODE (base) == STRING_CST);
+	}
+      return false;
+    }
+
+  /* Induction variables are constants.  */
+  if (!simple_iv (loop, loop_containing_stmt (stmt), op, &iv, false))
+    return false;
+  if (!is_gimple_min_invariant (iv.base))
+    return false;
+  if (!is_gimple_min_invariant (iv.step))
+    return false;
+  return true;
+}
+
+/* Computes an estimated number of insns in LOOP, weighted by WEIGHTS.
+   Return results in SIZE, estimate benefits for complette unrolling existing by EXIT.  */
 
-   NINSNS is the number of insns in the loop before unrolling.
-   NUNROLL is the number of times the loop is unrolled.  */
+static void
+tree_estimate_loop_size (struct loop *loop, edge exit, struct loop_size *size)
+{
+  basic_block *body = get_loop_body (loop);
+  gimple_stmt_iterator gsi;
+  unsigned int i;
+  bool after_exit;
+
+  size->overall = 0;
+  size->eliminated_by_peeling = 0;
+  size->last_iteration = 0;
+  size->last_iteration_eliminated_by_peeling = 0;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "Estimating sizes for loop %i\n", loop->num);
+  for (i = 0; i < loop->num_nodes; i++)
+    {
+      if (exit && body[i] != exit->src
+	  && dominated_by_p (CDI_DOMINATORS, body[i], exit->src))
+	after_exit = true;
+      else
+	after_exit = false;
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, " BB: %i, after_exit: %i\n", body[i]->index, after_exit);
+
+      for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple stmt = gsi_stmt (gsi);
+	  int num = estimate_num_insns (stmt, &eni_size_weights);
+	  bool likely_eliminated = false;
+
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    {
+	      fprintf (dump_file, "  size: %3i ", num);
+	      print_gimple_stmt (dump_file, gsi_stmt (gsi), 0, 0);
+	    }
+
+	  /* Look for reasons why we might optimize this stmt away. */
+
+	  /* Exit conditional.  */
+	  if (body[i] == exit->src && stmt == last_stmt (exit->src))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+	        fprintf (dump_file, "   Exit condition will be eliminated.\n");
+	      likely_eliminated = true;
+	    }
+	  /* Sets of IV variables  */
+	  else if (gimple_code (stmt) == GIMPLE_ASSIGN
+	      && constant_after_peeling (gimple_assign_lhs (stmt), stmt, loop))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+	        fprintf (dump_file, "   Induction variable computation will"
+			 " be folded away.\n");
+	      likely_eliminated = true;
+	    }
+	  /* Assignments of IV variables.  */
+	  else if (gimple_code (stmt) == GIMPLE_ASSIGN
+		   && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
+		   && constant_after_peeling (gimple_assign_rhs1 (stmt), stmt,loop)
+		   && (gimple_assign_rhs_class (stmt) != GIMPLE_BINARY_RHS
+		       || constant_after_peeling (gimple_assign_rhs2 (stmt),
+		       				  stmt, loop)))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+	        fprintf (dump_file, "   Constant expression will be folded away.\n");
+	      likely_eliminated = true;
+	    }
+	  /* Conditionals.  */
+	  else if (gimple_code (stmt) == GIMPLE_COND
+		   && constant_after_peeling (gimple_cond_lhs (stmt), stmt, loop)
+		   && constant_after_peeling (gimple_cond_rhs (stmt), stmt, loop))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+	        fprintf (dump_file, "   Constant conditional.\n");
+	      likely_eliminated = true;
+	    }
+
+	  size->overall += num;
+	  if (likely_eliminated)
+	    size->eliminated_by_peeling += num;
+	  if (!after_exit)
+	    {
+	      size->last_iteration += num;
+	      if (likely_eliminated)
+		size->last_iteration_eliminated_by_peeling += num;
+	    }
+	}
+    }
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "size: %i-%i, last_iteration: %i-%i\n", size->overall,
+    	     size->eliminated_by_peeling, size->last_iteration,
+	     size->last_iteration_eliminated_by_peeling);
+  
+  free (body);
+}
+
+/* Estimate number of insns of completely unrolled loop.
+   It is (NUNROLL + 1) * size of loop body with taking into account
+   the fact that in last copy everything after exit conditoinal
+   is dead and that some instructions will be eliminated after
+   peeling.
+
+   Loop body is likely going to simplify futher, this is dificult
+   to guess, we just decrease the result by 1/3.  */
 
 static unsigned HOST_WIDE_INT
-estimated_unrolled_size (unsigned HOST_WIDE_INT ninsns,
+estimated_unrolled_size (struct loop_size *size,
 			 unsigned HOST_WIDE_INT nunroll)
 {
-  HOST_WIDE_INT unr_insns = 2 * ((HOST_WIDE_INT) ninsns - 4) / 3;
+  HOST_WIDE_INT unr_insns = ((nunroll)
+  			     * (HOST_WIDE_INT) (size->overall
+			     			- size->eliminated_by_peeling));
+  if (!nunroll)
+    unr_insns = 0;
+  unr_insns += size->last_iteration - size->last_iteration_eliminated_by_peeling;
+
+  unr_insns = unr_insns * 2 / 3;
   if (unr_insns <= 0)
     unr_insns = 1;
-  unr_insns *= (nunroll + 1);
 
   return unr_insns;
 }
@@ -165,6 +321,7 @@ try_unroll_loop_completely (struct loop 
 {
   unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns;
   gimple cond;
+  struct loop_size size;
 
   if (loop->inner)
     return false;
@@ -182,9 +339,10 @@ try_unroll_loop_completely (struct loop 
       if (ul == UL_SINGLE_ITER)
 	return false;
 
-      ninsns = tree_num_loop_insns (loop, &eni_size_weights);
+      tree_estimate_loop_size (loop, exit, &size);
+      ninsns = size.overall;
 
-      unr_insns = estimated_unrolled_size (ninsns, n_unroll);
+      unr_insns = estimated_unrolled_size (&size, n_unroll);
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "  Loop size: %d\n", (int) ninsns);
Index: testsuite/gcc.dg/tree-ssa/pr21829.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/pr21829.c	(revision 147349)
+++ testsuite/gcc.dg/tree-ssa/pr21829.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-optimized -fdump-tree-cddce2" } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
 
 int test(int v)
 {
@@ -16,33 +16,7 @@ int test(int v)
   return x;
 }
 
-/* This should be optimized to
+/* This should be unrolled and optimized into conditional set of return value "v < 0".  */
 
-    if (v <= 0) goto <L1>; else goto <L3>;
-
-   <L1>:;
-
-    # x_1 = PHI <0(3), 1(1)>;
-   <L3>:;
-    return x_1;
-
-   retaining only a single conditional.  This doesn't work as nobody
-   combines the two tests
-
-    if (v < 0) goto <bb 4>; else goto <bb 3>;
-
-   <bb 3>:
-
-    if (v <= 0) goto <bb 4>; else goto <bb 5>;
-
-   this late in the game.  tree-ssa-ifcombine.c would do it if we would
-   unroll the loop during early loop unrolling though.
-
-   For now vrp2 does all the needed folding and threading and cddce2
-   provides a nice IL to scan.  */
-
-/* { dg-final { scan-tree-dump-times "if " 1 "optimized" { xfail *-*-* } } } */
-/* { dg-final { scan-tree-dump-times "if " 2 "cddce2" } } */
-/* { dg-final { scan-tree-dump "x_. = PHI <0\\\(.\\\), 1\\\(.\\\)>" "cddce2" } } */
-/* { dg-final { cleanup-tree-dump "cddce2" } } */
+/* { dg-final { scan-tree-dump-not "if \\(" "optimized" } } */
 /* { dg-final { cleanup-tree-dump "optimized" } } */
Index: testsuite/gcc.dg/Wunreachable-8.c
===================================================================
--- testsuite/gcc.dg/Wunreachable-8.c	(revision 147349)
+++ testsuite/gcc.dg/Wunreachable-8.c	(working copy)
@@ -4,9 +4,9 @@ float Factorial(float X)
 {
   float val = 1.0;
   int k,j;
-  for (k=1; k < 5; k++)
+  for (k=1; k < 5; k++) /* { dg-bogus "will never be executed" "" { xfail *-*-* } } */
     {
-      val += 1.0; /* { dg-bogus "will never be executed" "" { xfail *-*-* } } */
+      val += 1.0;
     }
   return (val); /* { dg-bogus "will never be executed" } */
 }
Follow-Ups:
- Re: Improve unrolled size estimates
  - From: Richard Guenther
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]