Make try_unroll_loop_completely to use loop bounds recorded

Fri Oct 12 13:28:00 GMT 2012

> > 	* f95-lang.c (gfc_init_builtin_functions): Build __builtin_unreachable.
> 
> I wonder if other languages need similar adjustment?

I also wondered ;) Only Fortran triggered, I will take a look.
> 
> +  /* Now destroy the loop.  First try to do so by cancelling the
> +     patch from exit conditional if we identified good candidate.
> +
> 
> you mean 'path' here, right?  Similar in other places.

Yeah.
> 
> +      /* Unloop destroys the latch edge.  */
> +      unloop (loop, &irred_invalidated);
> +      if (irred_invalidated)
> +       mark_irreducible_loops ();
> 
> this makes the whole thing quadratic in the number of loops.
> Please pass down a flag and handle this in the place we
> update SSA form (thus, once per function / unroll iteration).

This was in my first version of the patch. Then I noticed that remove_path
also relies on irreducible loops to be computed and does the exactly same
logic as I do.
So we would need to extend the &irred_invalidated to remove_path as well.

> 
> Or provide a more optimial implementation that only considers
> parent loops of 'loop' (even that is excessive though, quadratic
> in the loop depth).
> 
> -  FOR_EACH_LOOP (li, loop, 0)
> +  for (fel_init (&li, &next, LI_ONLY_INNERMOST); next;)
> 
> FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
> 
> but I'm sure that IV canonicalization should happen for all loops.
> So, why do you need this change?  Esp. we should get rid of
> single-iteration outer loops here, too.

FOR_EACH_LOOP seems to assume that loops are not cancelled under its hand.  I
was running into case where we cancelled inner loop but did not update SSA and
tried to estimate number of iterations in the outer loop (now also innermost)
that went into infinite loop.

Honza
> 
> -      FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
> +      for (fel_init (&li, &next, 0); next;)
> 
> see above - FOR_EACH_LOOP (li, loop, 0)
> 
> Otherwise looks ok.  Please update and re-post.
> 
> Thanks,
> Richard.
> 
> 
> > Index: testsuite/gcc.dg/tree-ssa/cunroll-1.c
> > ===================================================================
> > --- testsuite/gcc.dg/tree-ssa/cunroll-1.c	(revision 0)
> > +++ testsuite/gcc.dg/tree-ssa/cunroll-1.c	(revision 0)
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> > +int a[2];
> > +test(int c)
> > +{ 
> > +  int i;
> > +  for (i=0;i<c;i++)
> > +    a[i]=5;
> > +}
> > +/* Array bounds says the loop will not roll much.  */
> > +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 1 times). Last iteration exit edge was proved true." "cunroll"} } */
> > +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> > Index: testsuite/gcc.dg/tree-ssa/cunroll-2.c
> > ===================================================================
> > --- testsuite/gcc.dg/tree-ssa/cunroll-2.c	(revision 0)
> > +++ testsuite/gcc.dg/tree-ssa/cunroll-2.c	(revision 0)
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> > +int a[2];
> > +test(int c)
> > +{ 
> > +  int i;
> > +  for (i=0;i<c;i++)
> > +    {
> > +      a[i]=5;
> > +      if (test2())
> > +	return;
> > +    }
> > +}
> > +/* We are not able to get rid of the final conditional because the loop has two exits.  */
> > +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 2 times)." "cunroll"} } */
> > +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> > Index: testsuite/gcc.dg/tree-ssa/cunroll-3.c
> > ===================================================================
> > --- testsuite/gcc.dg/tree-ssa/cunroll-3.c	(revision 0)
> > +++ testsuite/gcc.dg/tree-ssa/cunroll-3.c	(revision 0)
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
> > +int a[1];
> > +test(int c)
> > +{ 
> > +  int i;
> > +  for (i=0;i<c;i++)
> > +    {
> > +      a[i]=5;
> > +    }
> > +}
> > +/* If we start duplicating headers prior curoll, this loop will have 0 iterations.  */
> > +
> > +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 1 times)." "cunrolli"} } */
> > +/* { dg-final { cleanup-tree-dump "cunrolli" } } */
> > Index: testsuite/gcc.dg/tree-ssa/cunroll-4.c
> > ===================================================================
> > --- testsuite/gcc.dg/tree-ssa/cunroll-4.c	(revision 0)
> > +++ testsuite/gcc.dg/tree-ssa/cunroll-4.c	(revision 0)
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> > +int a[1];
> > +test(int c)
> > +{ 
> > +  int i=0,j;
> > +  for (i=0;i<c;i++)
> > +    {
> > +      for (j=0;j<c;j++)
> > +	{
> > +          a[i]=5;
> > +	  test2();
> > +	}
> > +    }
> > +}
> > +
> > +/* We should do this as part of cunrolli, but our cost model do not take into account early exit
> > +   from the last iteration.  */
> > +/* { dg-final { scan-tree-dump-not "Turned loop 1 to non-loop; it never loops. Last iteration exit edge was proved true." "cunrolli"} } */
> > +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> > Index: testsuite/gcc.dg/tree-ssa/cunroll-5.c
> > ===================================================================
> > --- testsuite/gcc.dg/tree-ssa/cunroll-5.c	(revision 0)
> > +++ testsuite/gcc.dg/tree-ssa/cunroll-5.c	(revision 0)
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> > +int *a;
> > +test(int c)
> > +{ 
> > +  int i;
> > +  for (i=0;i<6;i++)
> > +    a[i]=5;
> > +}
> > +/* Basic testcase for complette unrolling.  */
> > +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 5 times). Exit condition of peeled iterations was eliminated. Last iteration exit edge was proved true." "cunroll"} } */
> > +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> > Index: cfgloopmanip.c
> > ===================================================================
> > --- cfgloopmanip.c	(revision 192360)
> > +++ cfgloopmanip.c	(working copy)
> > @@ -37,7 +37,6 @@ static int find_path (edge, basic_block 
> >  static void fix_loop_placements (struct loop *, bool *);
> >  static bool fix_bb_placement (basic_block);
> >  static void fix_bb_placements (basic_block, bool *);
> > -static void unloop (struct loop *, bool *);
> >  
> >  /* Checks whether basic block BB is dominated by DATA.  */
> >  static bool
> > @@ -895,7 +894,7 @@ loopify (edge latch_edge, edge header_ed
> >     If this may cause the information about irreducible regions to become
> >     invalid, IRRED_INVALIDATED is set to true.  */
> >  
> > -static void
> > +void
> >  unloop (struct loop *loop, bool *irred_invalidated)
> >  {
> >    basic_block *body;
> > Index: tree-ssa-loop-ivcanon.c
> > ===================================================================
> > --- tree-ssa-loop-ivcanon.c	(revision 192360)
> > +++ tree-ssa-loop-ivcanon.c	(working copy)
> > @@ -231,7 +231,7 @@ tree_estimate_loop_size (struct loop *lo
> >  	  /* Look for reasons why we might optimize this stmt away. */
> >  
> >  	  /* Exit conditional.  */
> > -	  if (body[i] == exit->src && stmt == last_stmt (exit->src))
> > +	  if (exit && body[i] == exit->src && stmt == last_stmt (exit->src))
> >  	    {
> >  	      if (dump_file && (dump_flags & TDF_DETAILS))
> >  	        fprintf (dump_file, "   Exit condition will be eliminated.\n");
> > @@ -326,13 +326,43 @@ try_unroll_loop_completely (struct loop 
> >    unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns;
> >    gimple cond;
> >    struct loop_size size;
> > +  bool n_unroll_found = false;
> > +  HOST_WIDE_INT maxiter;
> > +  basic_block latch;
> > +  edge latch_edge;
> > +  location_t locus;
> > +  int flags;
> > +  bool irred_invalidated = false;
> > +  gimple stmt;
> > +  gimple_stmt_iterator gsi;
> > +  edge other_edge = NULL, last_exit;
> > +  int num = loop->num;
> > +  bool last_iteration_updated = false;
> > +
> > +  /* See if we proved number of iterations to be low constant.  */
> > +  if (host_integerp (niter, 1))
> > +    {
> > +      n_unroll = tree_low_cst (niter, 1);
> > +      n_unroll_found = true;
> > +    }
> >  
> > -  if (loop->inner)
> > +  /* See if we can improve our estimate by using recorded loop bounds.  */
> > +  maxiter = max_loop_iterations_int (loop);
> > +  if (maxiter >= 0
> > +      && (!n_unroll_found || (unsigned HOST_WIDE_INT)maxiter < n_unroll))
> > +    {
> > +      n_unroll = maxiter;
> > +      n_unroll_found = true;
> > +    }
> > +
> > +  if (!n_unroll_found)
> >      return false;
> >  
> > -  if (!host_integerp (niter, 1))
> > +  /* We unroll only inner loops, because we do not consider it profitable
> > +     otheriwse.  We still can cancel loopback edge of not rolling loop;
> > +     this is always a good idea.  */
> > +  if (loop->inner && n_unroll)
> >      return false;
> > -  n_unroll = tree_low_cst (niter, 1);
> >  
> >    max_unroll = PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES);
> >    if (n_unroll > max_unroll)
> > @@ -340,6 +370,10 @@ try_unroll_loop_completely (struct loop 
> >  
> >    if (n_unroll)
> >      {
> > +      sbitmap wont_exit;
> > +      edge e;
> > +      unsigned i;
> > +      VEC (edge, heap) *to_remove = NULL;
> >        if (ul == UL_SINGLE_ITER)
> >  	return false;
> >  
> > @@ -372,14 +408,6 @@ try_unroll_loop_completely (struct loop 
> >  	    fprintf (dump_file, "Not unrolling loop %d.\n", loop->num);
> >  	  return false;
> >  	}
> > -    }
> > -
> > -  if (n_unroll)
> > -    {
> > -      sbitmap wont_exit;
> > -      edge e;
> > -      unsigned i;
> > -      VEC (edge, heap) *to_remove = NULL;
> >  
> >        initialize_original_copy_tables ();
> >        wont_exit = sbitmap_alloc (n_unroll + 1);
> > @@ -408,24 +436,143 @@ try_unroll_loop_completely (struct loop 
> >        free_original_copy_tables ();
> >      }
> >  
> > -  cond = last_stmt (exit->src);
> > -  if (exit->flags & EDGE_TRUE_VALUE)
> > -    gimple_cond_make_true (cond);
> > -  else
> > -    gimple_cond_make_false (cond);
> > -  update_stmt (cond);
> > +  /* After complette unrolling we still may get rid of the conditional
> > +     on exit in the last copy even if we have no idea what it does.
> > +     This is quite common case for loops of form
> > +
> > +     int a[5];
> > +     for (i=0;i<b;i++)
> > +       a[i]=0;
> > +
> > +     Here we prove the loop to iterate 5 times but we do not know
> > +     it from induction variable.
> > +
> > +     We want to make the conditional of exit true, so we can only 
> > +     consider exits that are not part of the inner (irreducible) loops
> > +     so we know that the conditional is tested at most once per iteration.
> > +
> > +     Situation is more complicated withloops with multiple exists:
> > +
> > +     int a[5];
> > +     for (i=0;i<b;i++)
> > +       {
> > +	 if (blah)
> > +	   break;
> > +	 a[i]=0;
> > +       }
> > +
> > +     Here we could cancel the conditional of if "(blah)". And:
> > +
> > +     int a[5];
> > +     for (i=0;i<b;i++)
> > +       {
> > +	 a[i]=0;
> > +	 if (blah)
> > +	   break;
> > +       }
> > + 
> > +     Here we can cancel the last i<b test.
> > +
> > +     To handle these we need to track all statements containing code that
> > +     can not execute on last iteration (in tree-ssa-loop-niter).
> > +     Then we can use control dependnecy (not computed here) to cancel all
> > +     the paths leading to them unless they are part of the inner loops.
> > +     This however seems bit like an overkill so we handle only the
> > +     simple case of single exit until interesting testcases appears.  */
> > +  last_exit = exit;
> > +  if (!last_exit)
> > +    {
> > +      last_exit = single_exit (loop);
> > +      if (!last_exit || last_exit->src->loop_father != loop
> > +	  || !(last_exit->flags & (EDGE_TRUE_VALUE | EDGE_FALSE_VALUE)))
> > +	last_exit = NULL;
> > +      else 
> > +	{
> > +	  if (last_exit->src->flags & BB_IRREDUCIBLE_LOOP)
> > +	    last_exit = NULL;
> > +	}
> > +    }
> > +
> > +  /* If loop has only one exit edge, we can remove the conditional from
> > +     the last copy of the loop.  
> > +     TODO: We should account this update into cost model.  */
> > +  if (last_exit)
> > +    {
> > +      cond = last_stmt (last_exit->src);
> > +      if (last_exit->flags & EDGE_TRUE_VALUE)
> > +	gimple_cond_make_true (cond);
> > +      else
> > +	gimple_cond_make_false (cond);
> > +      update_stmt (cond);
> > +      other_edge = EDGE_SUCC (last_exit->src, 0);
> > +      if (other_edge == last_exit)
> > +	other_edge = EDGE_SUCC (last_exit->src, 1);
> > +      last_iteration_updated = true;
> > +    }
> > +
> > +  /* Now destroy the loop.  First try to do so by cancelling the
> > +     patch from exit conditional if we identified good candidate.  
> > +
> > +     TODO: We should update the loop profile for the fact that
> > +     the last iteration no longer executes.  */
> > +  if (!other_edge || !remove_path (other_edge))
> > +    {
> > +      /* We did not manage to cancel the loop by removing the patch.
> > +         The loop latch remains reachable even if it will never be reached
> > +         at runtime.  We must redirect it to somewhere, so create basic
> > +         block containg __builtin_unreachable call for this reason.  */
> > +      latch = loop->latch;
> > +      latch_edge = loop_latch_edge (loop);
> > +      flags = latch_edge->flags;
> > +      locus = latch_edge->goto_locus;
> > +
> > +      /* Unloop destroys the latch edge.  */
> > +      unloop (loop, &irred_invalidated);
> > +      if (irred_invalidated)
> > +	mark_irreducible_loops ();
> > +
> > +      /* Create new basic block for the latch edge destination and wire
> > +	 it in.  */
> > +      stmt = gimple_build_call (builtin_decl_implicit (BUILT_IN_UNREACHABLE), 0);
> > +      latch_edge = make_edge (latch, create_basic_block (NULL, NULL, latch), flags);
> > +      gsi = gsi_start_bb (latch_edge->dest);
> > +      gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
> > +      latch_edge->dest->loop_father = current_loops->tree_root;
> > +      set_immediate_dominator (CDI_DOMINATORS, latch_edge->dest, latch_edge->src);
> > +      latch_edge->probability = 0;
> > +      latch_edge->count = 0;
> > +      latch_edge->flags |= flags;
> > +      latch_edge->goto_locus = locus;
> > +    }
> >  
> >    if (dump_file && (dump_flags & TDF_DETAILS))
> > -    fprintf (dump_file, "Unrolled loop %d completely.\n", loop->num);
> > +    {
> > +      if (!n_unroll)
> > +        fprintf (dump_file, "Turned loop %d to non-loop; it never loops.%s\n", num,
> > +		 last_iteration_updated
> > +		 ? " Last iteration exit edge was proved true." : "");
> > +      else
> > +	{
> > +          fprintf (dump_file, "Unrolled loop %d completely "
> > +		   "(duplicated %i times).%s%s\n", num, (int)n_unroll,
> > +		   exit
> > +		   ? " Exit condition of peeled iterations was eliminated." : "",
> > +		   last_iteration_updated
> > +		   ? " Last iteration exit edge was proved true." : "");
> > +	}
> > +    }
> >  
> > -  return true;
> > +  return true;
> >  }
> >  
> >  /* Adds a canonical induction variable to LOOP if suitable.
> >     CREATE_IV is true if we may create a new iv.  UL determines
> >     which loops we are allowed to completely unroll.  If TRY_EVAL is true, we try
> >     to determine the number of iterations of a loop by direct evaluation.
> > -   Returns true if cfg is changed.  */
> > +   Returns true if cfg is changed.  
> > +
> > +   IRREDUCIBLE_LOOPS_FOUND is used to bookkeep if we discovered irreducible
> > +   loops.  This is used in a special case of the exit condition analysis.  */
> >  
> >  static bool
> >  canonicalize_loop_induction_variables (struct loop *loop,
> > @@ -455,22 +602,34 @@ canonicalize_loop_induction_variables (s
> >  	      || TREE_CODE (niter) != INTEGER_CST))
> >  	niter = find_loop_niter_by_eval (loop, &exit);
> >  
> > -      if (chrec_contains_undetermined (niter)
> > -	  || TREE_CODE (niter) != INTEGER_CST)
> > -	return false;
> > +      if (TREE_CODE (niter) != INTEGER_CST)
> > +	exit = NULL;
> >      }
> >  
> > -  if (dump_file && (dump_flags & TDF_DETAILS))
> > +  /* We work exceptionally hard here to estimate the bound
> > +     by find_loop_niter_by_eval.  Be sure to keep it for future.  */
> > +  if (niter && TREE_CODE (niter) == INTEGER_CST)
> > +    record_niter_bound (loop, tree_to_double_int (niter), false, true);
> > +
> > +  if (dump_file && (dump_flags & TDF_DETAILS)
> > +      && TREE_CODE (niter) == INTEGER_CST)
> >      {
> >        fprintf (dump_file, "Loop %d iterates ", loop->num);
> >        print_generic_expr (dump_file, niter, TDF_SLIM);
> >        fprintf (dump_file, " times.\n");
> >      }
> > +  if (dump_file && (dump_flags & TDF_DETAILS)
> > +      && max_loop_iterations_int (loop) >= 0)
> > +    {
> > +      fprintf (dump_file, "Loop %d iterates at most %i times.\n", loop->num,
> > +	       (int)max_loop_iterations_int (loop));
> > +    }
> >  
> >    if (try_unroll_loop_completely (loop, exit, niter, ul))
> >      return true;
> >  
> > -  if (create_iv)
> > +  if (create_iv
> > +      && niter && !chrec_contains_undetermined (niter))
> >      create_canonical_iv (loop, exit, niter);
> >  
> >    return false;
> > @@ -483,11 +642,14 @@ unsigned int
> >  canonicalize_induction_variables (void)
> >  {
> >    loop_iterator li;
> > -  struct loop *loop;
> > +  struct loop *loop, *next;
> >    bool changed = false;
> >  
> > -  FOR_EACH_LOOP (li, loop, 0)
> > +  for (fel_init (&li, &next, LI_ONLY_INNERMOST); next;)
> >      {
> > +      loop = next;
> > +      fel_next (&li, &next);
> > +      
> >        changed |= canonicalize_loop_induction_variables (loop,
> >  							true, UL_SINGLE_ITER,
> >  							true);
> > @@ -591,14 +753,18 @@ tree_unroll_loops_completely (bool may_i
> >    bool changed;
> >    enum unroll_level ul;
> >    int iteration = 0;
> > +  struct loop *next;
> >  
> >    do
> >      {
> >        changed = false;
> >  
> > -      FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
> > +      for (fel_init (&li, &next, 0); next;)
> >  	{
> > -	  struct loop *loop_father = loop_outer (loop);
> > +	  struct loop *loop_father = loop_outer (next);
> > +
> > +	  loop = next;
> > +	  fel_next (&li, &next);
> >  
> >  	  if (may_increase_size && optimize_loop_for_speed_p (loop)
> >  	      /* Unroll outermost loops only if asked to do so or they do
> > Index: fortran/f95-lang.c
> > ===================================================================
> > --- fortran/f95-lang.c	(revision 192360)
> > +++ fortran/f95-lang.c	(working copy)
> > @@ -908,6 +908,11 @@ gfc_init_builtin_functions (void)
> >    gfc_define_builtin ("__builtin_expect", ftype, BUILT_IN_EXPECT,
> >  		      "__builtin_expect", ATTR_CONST_NOTHROW_LEAF_LIST);
> >  
> > +  ftype = build_function_type_list (void_type_node, NULL_TREE);
> > +
> > +  gfc_define_builtin ("__builtin_unreachable", ftype, BUILT_IN_EXPECT,
> > +		      "__builtin_unreachable", ATTR_NORETURN_NOTHROW_LEAF_LIST);
> > +
> >    ftype = build_function_type_list (void_type_node,
> >                                      pvoid_type_node, NULL_TREE);
> >    gfc_define_builtin ("__builtin_free", ftype, BUILT_IN_FREE,
> > Index: cfgloop.h
> > ===================================================================
> > --- cfgloop.h	(revision 192360)
> > +++ cfgloop.h	(working copy)
> > @@ -319,7 +319,8 @@ extern struct loop *loopify (edge, edge,
> >  struct loop * loop_version (struct loop *, void *,
> >  			    basic_block *, unsigned, unsigned, unsigned, bool);
> >  extern bool remove_path (edge);
> > -void scale_loop_frequencies (struct loop *, int, int);
> > +extern void scale_loop_frequencies (struct loop *, int, int);
> > +extern void unloop (struct loop *, bool *);
> >  
> >  /* Induction variable analysis.  */
> >  
> > 
> > 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE / SUSE Labs
> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
> GF: Jeff Hawn, Jennifer Guild, Felix Imend