This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Make try_unroll_loop_completely to use loop bounds recorded
- From: Richard Biener <rguenther at suse dot de>
- To: Jan Hubicka <hubicka at ucw dot cz>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Fri, 12 Oct 2012 10:05:53 +0200 (CEST)
- Subject: Re: Make try_unroll_loop_completely to use loop bounds recorded
- References: <20121011162109.GC32179@kam.mff.cuni.cz>
On Thu, 11 Oct 2012, Jan Hubicka wrote:
> Hi,
> while looking into RTL loop peeling micopmilation I found that we now do a lot of
> RTL loop peeling for loops of the form
>
> int a[2];
> test(int c)
> {
> int i;
> for (i=0;i<c;i++)
> a[i]=5;
> }
> this is because tree-ssa-loop-niter is able to prove that the loop iterates at
> most twice. This is better to be done on gimple level because destroying the
> loop enables more propagation.
>
> This patch started as simple attempt to make try_unroll_loop_completely
> to use max_loop_iterations_int. I had to track several issues
>
> 1) try_unroll_loop_completely handles only inner loops. I am fine with not
> peeling loops with subloops, but if the loop is known to iterate 0 times,
> we should turn it into non-loop.
> 2) try_unroll_loop_completely now handles removal of the loop by making
> exit edge conditional to be true and relying on cleanup_cfg to do
> the job. This does not work for all the cases we can handle
> by max_loop_iterations_int. When loop has multiple possible exits
> we don't really know what exit will be taken (we know it will be one
> of them).
>
> I added logic handling simple case where loop has single exit and
> in generic case I unloop it by adding __builtin_unreachable on the
> loopback path. While this seems bit involved it works well.
> 3) The last iteration has parts that are provably unreachable and should
> be optimized out later. VRP however tends to report array bound warnings
> that breaks -O3 bootstrap.
> I sent separate fix for that. Still I need to add 3 initializations
> to avoid maybe used uninitialized warning to get -O3 bootstrap working
> with this patch.
>
> The patch got quite a lot of snowballing effect, so I decided to not include:
>
> 1) the cost model is skewed for last iteration. It is often just duplicated
> loop header - i.e. all code dominated by the optmized out exit test should
> not be accounted. This would make cunroll-4.c testcase bellow to be unlooped
> during cunrolli rather than during complette unroling.
> 2) profile updating is broken for any loop containing non-trivial control flow.
> We need to teach loop duplication code to update sanely the profile when
> wont_exit is true and we need to update profile
> 3) we probably want to do less unrolling when result is not a single basic
> block. Current limit of 400 isnsns seems bit too large; making one basic
> block is important win. Making sequence of basic blocks with exits is
> less so.
Nice - this was on my TODO list as well, btw ;)
> * gcc.dg/tree-ssa/cunroll-1.c: New testcase.
> * gcc.dg/tree-ssa/cunroll-2.c: New testcase.
> * gcc.dg/tree-ssa/cunroll-3.c: New testcase.
> * gcc.dg/tree-ssa/cunroll-4.c: New testcase.
> * gcc.dg/tree-ssa/cunroll-5.c: New testcase.
>
> * cfgloopmanip.c (unloop): Export.
> * tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Estimate
> also with unknown exit conditoinal.
> (try_unroll_loop_completely): Use max_loop_iterations_int to unroll
> also loops with low upper bound; handle unlooping of the last loop
> even when exit conditional is not known; unloop loop that do not loop
> even if they are not innermost.
> (canonicalize_loop_induction_variables): Record niter bounds known;
> try unrolling even if number of iterations is not known;
> (canonicalize_induction_variables): Be ready for loops disappearing.
> (tree_unroll_loops_completely): Likewise.
> * cfgloop.h (unloop): Declare.
>
> * f95-lang.c (gfc_init_builtin_functions): Build __builtin_unreachable.
I wonder if other languages need similar adjustment?
+ /* Now destroy the loop. First try to do so by cancelling the
+ patch from exit conditional if we identified good candidate.
+
you mean 'path' here, right? Similar in other places.
+ /* Unloop destroys the latch edge. */
+ unloop (loop, &irred_invalidated);
+ if (irred_invalidated)
+ mark_irreducible_loops ();
this makes the whole thing quadratic in the number of loops.
Please pass down a flag and handle this in the place we
update SSA form (thus, once per function / unroll iteration).
Or provide a more optimial implementation that only considers
parent loops of 'loop' (even that is excessive though, quadratic
in the loop depth).
- FOR_EACH_LOOP (li, loop, 0)
+ for (fel_init (&li, &next, LI_ONLY_INNERMOST); next;)
FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
but I'm sure that IV canonicalization should happen for all loops.
So, why do you need this change? Esp. we should get rid of
single-iteration outer loops here, too.
- FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
+ for (fel_init (&li, &next, 0); next;)
see above - FOR_EACH_LOOP (li, loop, 0)
Otherwise looks ok. Please update and re-post.
Thanks,
Richard.
> Index: testsuite/gcc.dg/tree-ssa/cunroll-1.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/cunroll-1.c (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/cunroll-1.c (revision 0)
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> +int a[2];
> +test(int c)
> +{
> + int i;
> + for (i=0;i<c;i++)
> + a[i]=5;
> +}
> +/* Array bounds says the loop will not roll much. */
> +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 1 times). Last iteration exit edge was proved true." "cunroll"} } */
> +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> Index: testsuite/gcc.dg/tree-ssa/cunroll-2.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/cunroll-2.c (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/cunroll-2.c (revision 0)
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> +int a[2];
> +test(int c)
> +{
> + int i;
> + for (i=0;i<c;i++)
> + {
> + a[i]=5;
> + if (test2())
> + return;
> + }
> +}
> +/* We are not able to get rid of the final conditional because the loop has two exits. */
> +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 2 times)." "cunroll"} } */
> +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> Index: testsuite/gcc.dg/tree-ssa/cunroll-3.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/cunroll-3.c (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/cunroll-3.c (revision 0)
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
> +int a[1];
> +test(int c)
> +{
> + int i;
> + for (i=0;i<c;i++)
> + {
> + a[i]=5;
> + }
> +}
> +/* If we start duplicating headers prior curoll, this loop will have 0 iterations. */
> +
> +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 1 times)." "cunrolli"} } */
> +/* { dg-final { cleanup-tree-dump "cunrolli" } } */
> Index: testsuite/gcc.dg/tree-ssa/cunroll-4.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/cunroll-4.c (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/cunroll-4.c (revision 0)
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> +int a[1];
> +test(int c)
> +{
> + int i=0,j;
> + for (i=0;i<c;i++)
> + {
> + for (j=0;j<c;j++)
> + {
> + a[i]=5;
> + test2();
> + }
> + }
> +}
> +
> +/* We should do this as part of cunrolli, but our cost model do not take into account early exit
> + from the last iteration. */
> +/* { dg-final { scan-tree-dump-not "Turned loop 1 to non-loop; it never loops. Last iteration exit edge was proved true." "cunrolli"} } */
> +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> Index: testsuite/gcc.dg/tree-ssa/cunroll-5.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/cunroll-5.c (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/cunroll-5.c (revision 0)
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> +int *a;
> +test(int c)
> +{
> + int i;
> + for (i=0;i<6;i++)
> + a[i]=5;
> +}
> +/* Basic testcase for complette unrolling. */
> +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 5 times). Exit condition of peeled iterations was eliminated. Last iteration exit edge was proved true." "cunroll"} } */
> +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> Index: cfgloopmanip.c
> ===================================================================
> --- cfgloopmanip.c (revision 192360)
> +++ cfgloopmanip.c (working copy)
> @@ -37,7 +37,6 @@ static int find_path (edge, basic_block
> static void fix_loop_placements (struct loop *, bool *);
> static bool fix_bb_placement (basic_block);
> static void fix_bb_placements (basic_block, bool *);
> -static void unloop (struct loop *, bool *);
>
> /* Checks whether basic block BB is dominated by DATA. */
> static bool
> @@ -895,7 +894,7 @@ loopify (edge latch_edge, edge header_ed
> If this may cause the information about irreducible regions to become
> invalid, IRRED_INVALIDATED is set to true. */
>
> -static void
> +void
> unloop (struct loop *loop, bool *irred_invalidated)
> {
> basic_block *body;
> Index: tree-ssa-loop-ivcanon.c
> ===================================================================
> --- tree-ssa-loop-ivcanon.c (revision 192360)
> +++ tree-ssa-loop-ivcanon.c (working copy)
> @@ -231,7 +231,7 @@ tree_estimate_loop_size (struct loop *lo
> /* Look for reasons why we might optimize this stmt away. */
>
> /* Exit conditional. */
> - if (body[i] == exit->src && stmt == last_stmt (exit->src))
> + if (exit && body[i] == exit->src && stmt == last_stmt (exit->src))
> {
> if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file, " Exit condition will be eliminated.\n");
> @@ -326,13 +326,43 @@ try_unroll_loop_completely (struct loop
> unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns;
> gimple cond;
> struct loop_size size;
> + bool n_unroll_found = false;
> + HOST_WIDE_INT maxiter;
> + basic_block latch;
> + edge latch_edge;
> + location_t locus;
> + int flags;
> + bool irred_invalidated = false;
> + gimple stmt;
> + gimple_stmt_iterator gsi;
> + edge other_edge = NULL, last_exit;
> + int num = loop->num;
> + bool last_iteration_updated = false;
> +
> + /* See if we proved number of iterations to be low constant. */
> + if (host_integerp (niter, 1))
> + {
> + n_unroll = tree_low_cst (niter, 1);
> + n_unroll_found = true;
> + }
>
> - if (loop->inner)
> + /* See if we can improve our estimate by using recorded loop bounds. */
> + maxiter = max_loop_iterations_int (loop);
> + if (maxiter >= 0
> + && (!n_unroll_found || (unsigned HOST_WIDE_INT)maxiter < n_unroll))
> + {
> + n_unroll = maxiter;
> + n_unroll_found = true;
> + }
> +
> + if (!n_unroll_found)
> return false;
>
> - if (!host_integerp (niter, 1))
> + /* We unroll only inner loops, because we do not consider it profitable
> + otheriwse. We still can cancel loopback edge of not rolling loop;
> + this is always a good idea. */
> + if (loop->inner && n_unroll)
> return false;
> - n_unroll = tree_low_cst (niter, 1);
>
> max_unroll = PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES);
> if (n_unroll > max_unroll)
> @@ -340,6 +370,10 @@ try_unroll_loop_completely (struct loop
>
> if (n_unroll)
> {
> + sbitmap wont_exit;
> + edge e;
> + unsigned i;
> + VEC (edge, heap) *to_remove = NULL;
> if (ul == UL_SINGLE_ITER)
> return false;
>
> @@ -372,14 +408,6 @@ try_unroll_loop_completely (struct loop
> fprintf (dump_file, "Not unrolling loop %d.\n", loop->num);
> return false;
> }
> - }
> -
> - if (n_unroll)
> - {
> - sbitmap wont_exit;
> - edge e;
> - unsigned i;
> - VEC (edge, heap) *to_remove = NULL;
>
> initialize_original_copy_tables ();
> wont_exit = sbitmap_alloc (n_unroll + 1);
> @@ -408,24 +436,143 @@ try_unroll_loop_completely (struct loop
> free_original_copy_tables ();
> }
>
> - cond = last_stmt (exit->src);
> - if (exit->flags & EDGE_TRUE_VALUE)
> - gimple_cond_make_true (cond);
> - else
> - gimple_cond_make_false (cond);
> - update_stmt (cond);
> + /* After complette unrolling we still may get rid of the conditional
> + on exit in the last copy even if we have no idea what it does.
> + This is quite common case for loops of form
> +
> + int a[5];
> + for (i=0;i<b;i++)
> + a[i]=0;
> +
> + Here we prove the loop to iterate 5 times but we do not know
> + it from induction variable.
> +
> + We want to make the conditional of exit true, so we can only
> + consider exits that are not part of the inner (irreducible) loops
> + so we know that the conditional is tested at most once per iteration.
> +
> + Situation is more complicated withloops with multiple exists:
> +
> + int a[5];
> + for (i=0;i<b;i++)
> + {
> + if (blah)
> + break;
> + a[i]=0;
> + }
> +
> + Here we could cancel the conditional of if "(blah)". And:
> +
> + int a[5];
> + for (i=0;i<b;i++)
> + {
> + a[i]=0;
> + if (blah)
> + break;
> + }
> +
> + Here we can cancel the last i<b test.
> +
> + To handle these we need to track all statements containing code that
> + can not execute on last iteration (in tree-ssa-loop-niter).
> + Then we can use control dependnecy (not computed here) to cancel all
> + the paths leading to them unless they are part of the inner loops.
> + This however seems bit like an overkill so we handle only the
> + simple case of single exit until interesting testcases appears. */
> + last_exit = exit;
> + if (!last_exit)
> + {
> + last_exit = single_exit (loop);
> + if (!last_exit || last_exit->src->loop_father != loop
> + || !(last_exit->flags & (EDGE_TRUE_VALUE | EDGE_FALSE_VALUE)))
> + last_exit = NULL;
> + else
> + {
> + if (last_exit->src->flags & BB_IRREDUCIBLE_LOOP)
> + last_exit = NULL;
> + }
> + }
> +
> + /* If loop has only one exit edge, we can remove the conditional from
> + the last copy of the loop.
> + TODO: We should account this update into cost model. */
> + if (last_exit)
> + {
> + cond = last_stmt (last_exit->src);
> + if (last_exit->flags & EDGE_TRUE_VALUE)
> + gimple_cond_make_true (cond);
> + else
> + gimple_cond_make_false (cond);
> + update_stmt (cond);
> + other_edge = EDGE_SUCC (last_exit->src, 0);
> + if (other_edge == last_exit)
> + other_edge = EDGE_SUCC (last_exit->src, 1);
> + last_iteration_updated = true;
> + }
> +
> + /* Now destroy the loop. First try to do so by cancelling the
> + patch from exit conditional if we identified good candidate.
> +
> + TODO: We should update the loop profile for the fact that
> + the last iteration no longer executes. */
> + if (!other_edge || !remove_path (other_edge))
> + {
> + /* We did not manage to cancel the loop by removing the patch.
> + The loop latch remains reachable even if it will never be reached
> + at runtime. We must redirect it to somewhere, so create basic
> + block containg __builtin_unreachable call for this reason. */
> + latch = loop->latch;
> + latch_edge = loop_latch_edge (loop);
> + flags = latch_edge->flags;
> + locus = latch_edge->goto_locus;
> +
> + /* Unloop destroys the latch edge. */
> + unloop (loop, &irred_invalidated);
> + if (irred_invalidated)
> + mark_irreducible_loops ();
> +
> + /* Create new basic block for the latch edge destination and wire
> + it in. */
> + stmt = gimple_build_call (builtin_decl_implicit (BUILT_IN_UNREACHABLE), 0);
> + latch_edge = make_edge (latch, create_basic_block (NULL, NULL, latch), flags);
> + gsi = gsi_start_bb (latch_edge->dest);
> + gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
> + latch_edge->dest->loop_father = current_loops->tree_root;
> + set_immediate_dominator (CDI_DOMINATORS, latch_edge->dest, latch_edge->src);
> + latch_edge->probability = 0;
> + latch_edge->count = 0;
> + latch_edge->flags |= flags;
> + latch_edge->goto_locus = locus;
> + }
>
> if (dump_file && (dump_flags & TDF_DETAILS))
> - fprintf (dump_file, "Unrolled loop %d completely.\n", loop->num);
> + {
> + if (!n_unroll)
> + fprintf (dump_file, "Turned loop %d to non-loop; it never loops.%s\n", num,
> + last_iteration_updated
> + ? " Last iteration exit edge was proved true." : "");
> + else
> + {
> + fprintf (dump_file, "Unrolled loop %d completely "
> + "(duplicated %i times).%s%s\n", num, (int)n_unroll,
> + exit
> + ? " Exit condition of peeled iterations was eliminated." : "",
> + last_iteration_updated
> + ? " Last iteration exit edge was proved true." : "");
> + }
> + }
>
> - return true;
> + return true;
> }
>
> /* Adds a canonical induction variable to LOOP if suitable.
> CREATE_IV is true if we may create a new iv. UL determines
> which loops we are allowed to completely unroll. If TRY_EVAL is true, we try
> to determine the number of iterations of a loop by direct evaluation.
> - Returns true if cfg is changed. */
> + Returns true if cfg is changed.
> +
> + IRREDUCIBLE_LOOPS_FOUND is used to bookkeep if we discovered irreducible
> + loops. This is used in a special case of the exit condition analysis. */
>
> static bool
> canonicalize_loop_induction_variables (struct loop *loop,
> @@ -455,22 +602,34 @@ canonicalize_loop_induction_variables (s
> || TREE_CODE (niter) != INTEGER_CST))
> niter = find_loop_niter_by_eval (loop, &exit);
>
> - if (chrec_contains_undetermined (niter)
> - || TREE_CODE (niter) != INTEGER_CST)
> - return false;
> + if (TREE_CODE (niter) != INTEGER_CST)
> + exit = NULL;
> }
>
> - if (dump_file && (dump_flags & TDF_DETAILS))
> + /* We work exceptionally hard here to estimate the bound
> + by find_loop_niter_by_eval. Be sure to keep it for future. */
> + if (niter && TREE_CODE (niter) == INTEGER_CST)
> + record_niter_bound (loop, tree_to_double_int (niter), false, true);
> +
> + if (dump_file && (dump_flags & TDF_DETAILS)
> + && TREE_CODE (niter) == INTEGER_CST)
> {
> fprintf (dump_file, "Loop %d iterates ", loop->num);
> print_generic_expr (dump_file, niter, TDF_SLIM);
> fprintf (dump_file, " times.\n");
> }
> + if (dump_file && (dump_flags & TDF_DETAILS)
> + && max_loop_iterations_int (loop) >= 0)
> + {
> + fprintf (dump_file, "Loop %d iterates at most %i times.\n", loop->num,
> + (int)max_loop_iterations_int (loop));
> + }
>
> if (try_unroll_loop_completely (loop, exit, niter, ul))
> return true;
>
> - if (create_iv)
> + if (create_iv
> + && niter && !chrec_contains_undetermined (niter))
> create_canonical_iv (loop, exit, niter);
>
> return false;
> @@ -483,11 +642,14 @@ unsigned int
> canonicalize_induction_variables (void)
> {
> loop_iterator li;
> - struct loop *loop;
> + struct loop *loop, *next;
> bool changed = false;
>
> - FOR_EACH_LOOP (li, loop, 0)
> + for (fel_init (&li, &next, LI_ONLY_INNERMOST); next;)
> {
> + loop = next;
> + fel_next (&li, &next);
> +
> changed |= canonicalize_loop_induction_variables (loop,
> true, UL_SINGLE_ITER,
> true);
> @@ -591,14 +753,18 @@ tree_unroll_loops_completely (bool may_i
> bool changed;
> enum unroll_level ul;
> int iteration = 0;
> + struct loop *next;
>
> do
> {
> changed = false;
>
> - FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
> + for (fel_init (&li, &next, 0); next;)
> {
> - struct loop *loop_father = loop_outer (loop);
> + struct loop *loop_father = loop_outer (next);
> +
> + loop = next;
> + fel_next (&li, &next);
>
> if (may_increase_size && optimize_loop_for_speed_p (loop)
> /* Unroll outermost loops only if asked to do so or they do
> Index: fortran/f95-lang.c
> ===================================================================
> --- fortran/f95-lang.c (revision 192360)
> +++ fortran/f95-lang.c (working copy)
> @@ -908,6 +908,11 @@ gfc_init_builtin_functions (void)
> gfc_define_builtin ("__builtin_expect", ftype, BUILT_IN_EXPECT,
> "__builtin_expect", ATTR_CONST_NOTHROW_LEAF_LIST);
>
> + ftype = build_function_type_list (void_type_node, NULL_TREE);
> +
> + gfc_define_builtin ("__builtin_unreachable", ftype, BUILT_IN_EXPECT,
> + "__builtin_unreachable", ATTR_NORETURN_NOTHROW_LEAF_LIST);
> +
> ftype = build_function_type_list (void_type_node,
> pvoid_type_node, NULL_TREE);
> gfc_define_builtin ("__builtin_free", ftype, BUILT_IN_FREE,
> Index: cfgloop.h
> ===================================================================
> --- cfgloop.h (revision 192360)
> +++ cfgloop.h (working copy)
> @@ -319,7 +319,8 @@ extern struct loop *loopify (edge, edge,
> struct loop * loop_version (struct loop *, void *,
> basic_block *, unsigned, unsigned, unsigned, bool);
> extern bool remove_path (edge);
> -void scale_loop_frequencies (struct loop *, int, int);
> +extern void scale_loop_frequencies (struct loop *, int, int);
> +extern void unloop (struct loop *, bool *);
>
> /* Induction variable analysis. */
>
>
>
--
Richard Biener <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend