This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Improve unrolled size estimates


On Mon, 11 May 2009, Jan Hubicka wrote:

> Hi,
> as discussed earlier, this patch tries to solve problem with unrolling
> heuristic being bit too random, so actually fixing bugs in estimate_num_insns
> is causing testsuite regressions all around the place, mostly because we miss
> complette unrolling where we previously didn't.
> 
> The original code simply estimated unrolled size as 3/4 of nunroll *
> (rolled_size - 4) where constant of 4 counts for conditional and IV
> computation.
> 
> This is wrong for several reasons:
>   1) last iteration of loop is reachable only till the exit condition.
>      Since exit condition tends to be on the front of loop and we are speaking
>      of loop rolling few times (typycaly less then 4), this is actually
>      important portion.
>   2) When IV renders to be constant, some exressions simplify.
> 
> This patch adds tree_estimate_loop_size that is trying to take these factors
> into account.  It use dominance info to figure out BBs after exit conditional
> and it uses IV info to prove constantness of expressions.  It also special case
> accesses to constant initialized arrays and string constants.  With my 
> const patch I run into issues here previously, in testsuite we have some testcases
> that actually excercise this scenario well.
> 
> This code could be extended, for instance we can keep bitmap of constant SSA
> name and propagate the knowledge as we walk the body, but I hope this will work
> well enough in practice.
> 
> So new formulae is
> (nunroll - 1) * (body_size - optimized_out_stmts)
> + (last_iteration_size - optimized_out_stmts_in_last_iteration)
> Still 1/3 of loop is accounted for future optimization.  I broke the actual
> computation out to tree_estimate_loop_size since it could be used by peeling
> and other transforms too. Complette unrolling is just the most important case.
> 
> Bootstrapped/regtested i686-linux, OK?

Hum.  I find 1/3 future optimization a bit gross, but read on...

> Honza
> 
> 	* tree-ssa-loop-ivcanon.c (struct loop_size): new structure.
> 	(constant_after_peeling): New predicate.
> 	(tree_estimate_loop_size): New function.
> 	(estimated_unrolled_size): Rewrite for new estimates.
> 	(try_unroll_loop_completely): Use new estimates.
> 
> 	* testsuite/gcc.dg/tree-ssa/pr21829.c: Simplify matching since
> 	we now optimize better.
> 	* testsuite/gcc.dg/Wunreachable-8.c: Bogus warnings now come
> 	out at different places.
> 
> Index: tree-ssa-loop-ivcanon.c
> ===================================================================
> --- tree-ssa-loop-ivcanon.c	(revision 147349)
> +++ tree-ssa-loop-ivcanon.c	(working copy)
> @@ -118,7 +118,7 @@ tree_num_loop_insns (struct loop *loop, 
>  {
>    basic_block *body = get_loop_body (loop);
>    gimple_stmt_iterator gsi;
> -  unsigned size = 1, i;
> +  unsigned size = 0, i;
>  
>    for (i = 0; i < loop->num_nodes; i++)
>      for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi); gsi_next (&gsi))
> @@ -128,28 +128,184 @@ tree_num_loop_insns (struct loop *loop, 
>    return size;
>  }
>  
> -/* Estimate number of insns of completely unrolled loop.  We assume
> -   that the size of the unrolled loop is decreased in the
> -   following way (the numbers of insns are based on what
> -   estimate_num_insns returns for appropriate statements):
> -
> -   1) exit condition gets removed (2 insns)
> -   2) increment of the control variable gets removed (2 insns)
> -   3) All remaining statements are likely to get simplified
> -      due to constant propagation.  Hard to estimate; just
> -      as a heuristics we decrease the rest by 1/3.
> +/* Describe size of loop as detected by tree_estimate_loop_size.  */
> +struct loop_size
> +{
> +  /* Number of instructions in the loop.  */
> +  int overall;
> +  /* Number of instructions that will be likely optimized out in
> +     peeled iterations of loop  (i.e. computation based on induction
> +     variable where induction variable starts at known constant.)  */
> +  int eliminated_by_peeling;
> +  /* Same statistics for last iteration of loop: it is smaller because
> +     instructions after exit are not executed.  */
> +  int last_iteration;
> +  int last_iteration_eliminated_by_peeling;
> +};
> +
> +/* Return true if OP in STMT will be constant after peeling LOOP.  */
> +
> +static bool
> +constant_after_peeling (tree op, gimple stmt, struct loop *loop)
> +{
> +  affine_iv iv;
> +  if (is_gimple_min_invariant (op))
> +    return true;
> +  
> +  /* We can still fold accesses to constant arrays when index is known.  */
> +  if (TREE_CODE (op) != SSA_NAME)
> +    {
> +      tree base = op;
> +
> +      /* First make fast look if we see constant array inside.  */
> +      while (handled_component_p (base) || TREE_CODE (base) == ARRAY_REF)

ARRAY_REF is in handled_component_p.

> +	base = TREE_OPERAND (base, 0);
> +      if ((DECL_P (base) && TREE_CONSTANT (base) && DECL_INITIAL (base))
> +	  || TREE_CODE (base) == STRING_CST)
> +	{
> +	  /* If so, see if we understand all the indices.  */
> +	  base = op;
> +	  while (handled_component_p (base)
> +		 || (TREE_CODE (base) == ARRAY_REF
> +		     && constant_after_peeling (TREE_OPERAND (base, 1), stmt, loop)))
> +	    base = TREE_OPERAND (base, 0);
> +	  return (DECL_P (base) || TREE_CODE (base) == STRING_CST);
> +	}
> +      return false;
> +    }

Is it worth handling this very special case?  Note that you are 
recursively asking for stuff again, so this may become quadratic, 
especially as simple_iv isn't exactly very cheap.

> +
> +  /* Induction variables are constants.  */
> +  if (!simple_iv (loop, loop_containing_stmt (stmt), op, &iv, false))
> +    return false;
> +  if (!is_gimple_min_invariant (iv.base))
> +    return false;
> +  if (!is_gimple_min_invariant (iv.step))
> +    return false;
> +  return true;
> +}
> +
> +/* Computes an estimated number of insns in LOOP, weighted by WEIGHTS.
> +   Return results in SIZE, estimate benefits for complette unrolling existing by EXIT.  */
>  
> -   NINSNS is the number of insns in the loop before unrolling.
> -   NUNROLL is the number of times the loop is unrolled.  */
> +static void
> +tree_estimate_loop_size (struct loop *loop, edge exit, struct loop_size *size)
> +{
> +  basic_block *body = get_loop_body (loop);
> +  gimple_stmt_iterator gsi;
> +  unsigned int i;
> +  bool after_exit;
> +
> +  size->overall = 0;
> +  size->eliminated_by_peeling = 0;
> +  size->last_iteration = 0;
> +  size->last_iteration_eliminated_by_peeling = 0;
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    fprintf (dump_file, "Estimating sizes for loop %i\n", loop->num);
> +  for (i = 0; i < loop->num_nodes; i++)
> +    {
> +      if (exit && body[i] != exit->src
> +	  && dominated_by_p (CDI_DOMINATORS, body[i], exit->src))
> +	after_exit = true;
> +      else
> +	after_exit = false;
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +	fprintf (dump_file, " BB: %i, after_exit: %i\n", body[i]->index, after_exit);
> +
> +      for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi); gsi_next (&gsi))
> +	{
> +	  gimple stmt = gsi_stmt (gsi);
> +	  int num = estimate_num_insns (stmt, &eni_size_weights);
> +	  bool likely_eliminated = false;
> +
> +	  if (dump_file && (dump_flags & TDF_DETAILS))
> +	    {
> +	      fprintf (dump_file, "  size: %3i ", num);
> +	      print_gimple_stmt (dump_file, gsi_stmt (gsi), 0, 0);
> +	    }
> +
> +	  /* Look for reasons why we might optimize this stmt away. */
> +
> +	  /* Exit conditional.  */
> +	  if (body[i] == exit->src && stmt == last_stmt (exit->src))
> +	    {
> +	      if (dump_file && (dump_flags & TDF_DETAILS))
> +	        fprintf (dump_file, "   Exit condition will be eliminated.\n");
> +	      likely_eliminated = true;
> +	    }
> +	  /* Sets of IV variables  */
> +	  else if (gimple_code (stmt) == GIMPLE_ASSIGN
> +	      && constant_after_peeling (gimple_assign_lhs (stmt), stmt, loop))
> +	    {
> +	      if (dump_file && (dump_flags & TDF_DETAILS))
> +	        fprintf (dump_file, "   Induction variable computation will"
> +			 " be folded away.\n");
> +	      likely_eliminated = true;
> +	    }
> +	  /* Assignments of IV variables.  */
> +	  else if (gimple_code (stmt) == GIMPLE_ASSIGN
> +		   && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
> +		   && constant_after_peeling (gimple_assign_rhs1 (stmt), stmt,loop)
> +		   && (gimple_assign_rhs_class (stmt) != GIMPLE_BINARY_RHS
> +		       || constant_after_peeling (gimple_assign_rhs2 (stmt),
> +		       				  stmt, loop)))
> +	    {
> +	      if (dump_file && (dump_flags & TDF_DETAILS))
> +	        fprintf (dump_file, "   Constant expression will be folded away.\n");
> +	      likely_eliminated = true;
> +	    }
> +	  /* Conditionals.  */
> +	  else if (gimple_code (stmt) == GIMPLE_COND
> +		   && constant_after_peeling (gimple_cond_lhs (stmt), stmt, loop)
> +		   && constant_after_peeling (gimple_cond_rhs (stmt), stmt, loop))
> +	    {
> +	      if (dump_file && (dump_flags & TDF_DETAILS))
> +	        fprintf (dump_file, "   Constant conditional.\n");
> +	      likely_eliminated = true;
> +	    }
> +
> +	  size->overall += num;
> +	  if (likely_eliminated)
> +	    size->eliminated_by_peeling += num;
> +	  if (!after_exit)
> +	    {
> +	      size->last_iteration += num;
> +	      if (likely_eliminated)
> +		size->last_iteration_eliminated_by_peeling += num;
> +	    }
> +	}
> +    }
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    fprintf (dump_file, "size: %i-%i, last_iteration: %i-%i\n", size->overall,
> +    	     size->eliminated_by_peeling, size->last_iteration,
> +	     size->last_iteration_eliminated_by_peeling);
> +  
> +  free (body);
> +}

So you are basically doing constant propagation on the loop body with
initial values from the entry edge.  Why not use CCP for this if you
go through all this hassle?

IMHO the right thing to do is to enhance the SSA propagator to be
able to work on SESE or SEME regions.  You should be able to seed
SSA names before iterating (in this case with the entry edge values)
and not call substitute_and_fold.

If we want to go this way making heuristics fancy.

Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]