Make try_unroll_loop_completely to use loop bounds recorded
Richard Biener
rguenther@suse.de
Wed Oct 17 11:11:00 GMT 2012
On Tue, 16 Oct 2012, Jan Hubicka wrote:
> Hi,
> here is third revised version of the complette unroling changes. While working
> on the RTL variant I noticed PR54937 and the fact that I was overly aggressive
> on forcing single exit of the last iteration to be taken, because loop may terminate
> otherwise (by EH or by exitting the program). Same thinko is in loop-niter.
>
> This patch adds loop_edge_to_cancel that is more conservative: it looks for the
> exit conditional where the non-exitting edges leads to latch and verifies that
> latch contains no statement with side effect that may terminate the loop.
> This still actually matches quite few non-single-exit loops and works well in
> practice.
>
> Unlike previous revision it also enables complette unrolling when code size
> does not grow even for non-innermost loops (with update in
> tree_unroll_loops_completely to walk them). This is something we did on RTL
> land but missed in trees. This actually enables quite some optimizations when
> things can be propagated to the tiny inner loop body.
>
> I also fixed accounting in tree_estimate_loop_size for the cases where last
> iteration is not going to be updated.
>
> Finally I added code constructing __bulitin_unreachable as suggested by
> Ian.
>
> Bootstrapped/regtested x86_64-linux, also bootstrapped with -O3 and -Werror
> disabled and benchmarked. Among best benefits is about 7% improvement on Applu,
> and it causes up to 15% improvements on vectorized loops with small iteration
> counts (by completelly peeling the precondition code). There are no real
> performance regressions but there is some code size bloat.
>
> I plan to followup with strenghtening the heuristic to disable unrolling when
> benefits are absymal. Easy is to limit unrolling on loops with CFG and/or
> calls in them. We already have quite informed analysis in place. I also plan
> to move simple FDO guided loop peeling from RTL level to trees to enable more
> propagation into peeled sequences.
>
> The patch also triggers bug in niter and requires xfailing do_1.f90 testcase.
> I filled PR 54932 to track this.
>
> There are also confused array bound warnings I hope to track incrementally, too,
> by recording statements that are known to become unreachable in the last
> iteration and adding __buitin_unreachable in front of them. This is also
> important to avoid duplication leading to dead code: no other optimizers
> force paths leading to out of bound accesses to not happen.
Quite a large patch ... it could have been split into bugfixes
and enhancements ;)
Still - ok!
Thanks,
Richard.
> Honza
>
>
> * tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Add edge_to_cancel
> parameter and use it to estimate code optimized out in the final iteration.
> (loop_edge_to_cancel): New function.
> (try_unroll_loop_completely): New IRRED_IVALIDATED parameter;
> handle unrolling loops with bounds given via max_loop_iteratins;
> handle unrolling non-inner loops when code size shrinks;
> tidy dump output; when the last iteration loop still stays
> as loop in the CFG forcongly redirect the latch to
> __builtin_unreachable.
> (canonicalize_loop_induction_variables): Add irred_invlaidated
> parameter; record niter bound derrived; dump
> max_loop_iterations bounds; call try_unroll_loop_completely
> even if no niter bound is given.
> (canonicalize_induction_variables): Handle irred_invalidated.
> (tree_unroll_loops_completely): Handle non-innermost loops;
> handle irred_invalidated.
> * cfgloop.h (unlop): Declare.
> * cfgloopmanip.c (unloop): Export.
> * tree.c (build_common_builtin_nodes): Build BULTIN_UNREACHABLE.
>
> * gcc.target/i386/l_fma_float_?.c: Update.
> * gcc.target/i386/l_fma_double_?.c: Update.
> * gfortran.dg/do_1.f90: XFAIL
> * gcc.dg/tree-ssa/cunroll-1.c: New testcase.
> * gcc.dg/tree-ssa/cunroll-2.c: New testcase.
> * gcc.dg/tree-ssa/cunroll-3.c: New testcase.
> * gcc.dg/tree-ssa/cunroll-4.c: New testcase.
> * gcc.dg/tree-ssa/cunroll-5.c: New testcase.
> * gcc.dg/tree-ssa/ldist-17.c: Block cunroll to make testcase still
> valid.
> Index: tree-ssa-loop-ivcanon.c
> ===================================================================
> --- tree-ssa-loop-ivcanon.c (revision 192483)
> +++ tree-ssa-loop-ivcanon.c (working copy)
> @@ -192,7 +192,7 @@ constant_after_peeling (tree op, gimple
> Return results in SIZE, estimate benefits for complete unrolling exiting by EXIT. */
>
> static void
> -tree_estimate_loop_size (struct loop *loop, edge exit, struct loop_size *size)
> +tree_estimate_loop_size (struct loop *loop, edge exit, edge edge_to_cancel, struct loop_size *size)
> {
> basic_block *body = get_loop_body (loop);
> gimple_stmt_iterator gsi;
> @@ -208,8 +208,8 @@ tree_estimate_loop_size (struct loop *lo
> fprintf (dump_file, "Estimating sizes for loop %i\n", loop->num);
> for (i = 0; i < loop->num_nodes; i++)
> {
> - if (exit && body[i] != exit->src
> - && dominated_by_p (CDI_DOMINATORS, body[i], exit->src))
> + if (edge_to_cancel && body[i] != edge_to_cancel->src
> + && dominated_by_p (CDI_DOMINATORS, body[i], edge_to_cancel->src))
> after_exit = true;
> else
> after_exit = false;
> @@ -231,7 +231,7 @@ tree_estimate_loop_size (struct loop *lo
> /* Look for reasons why we might optimize this stmt away. */
>
> /* Exit conditional. */
> - if (body[i] == exit->src && stmt == last_stmt (exit->src))
> + if (exit && body[i] == exit->src && stmt == last_stmt (exit->src))
> {
> if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file, " Exit condition will be eliminated.\n");
> @@ -314,36 +314,161 @@ estimated_unrolled_size (struct loop_siz
> return unr_insns;
> }
>
> +/* Loop LOOP is known to not loop. See if there is an edge in the loop
> + body that can be remove to make the loop to always exit and at
> + the same time it does not make any code potentially executed
> + during the last iteration dead.
> +
> + After complette unrolling we still may get rid of the conditional
> + on the exit in the last copy even if we have no idea what it does.
> + This is quite common case for loops of form
> +
> + int a[5];
> + for (i=0;i<b;i++)
> + a[i]=0;
> +
> + Here we prove the loop to iterate 5 times but we do not know
> + it from induction variable.
> +
> + For now we handle only simple case where there is exit condition
> + just before the latch block and the latch block contains no statements
> + with side effect that may otherwise terminate the execution of loop
> + (such as by EH or by terminating the program or longjmp).
> +
> + In the general case we may want to cancel the paths leading to statements
> + loop-niter identified as having undefined effect in the last iteration.
> + The other cases are hopefully rare and will be cleaned up later. */
> +
> +edge
> +loop_edge_to_cancel (struct loop *loop)
> +{
> + VEC (edge, heap) *exits;
> + unsigned i;
> + edge edge_to_cancel;
> + gimple_stmt_iterator gsi;
> +
> + /* We want only one predecestor of the loop. */
> + if (EDGE_COUNT (loop->latch->preds) > 1)
> + return NULL;
> +
> + exits = get_loop_exit_edges (loop);
> +
> + FOR_EACH_VEC_ELT (edge, exits, i, edge_to_cancel)
> + {
> + /* Find the other edge than the loop exit
> + leaving the conditoinal. */
> + if (EDGE_COUNT (edge_to_cancel->src->succs) != 2)
> + continue;
> + if (EDGE_SUCC (edge_to_cancel->src, 0) == edge_to_cancel)
> + edge_to_cancel = EDGE_SUCC (edge_to_cancel->src, 1);
> + else
> + edge_to_cancel = EDGE_SUCC (edge_to_cancel->src, 0);
> +
> + /* We should never have conditionals in the loop latch. */
> + gcc_assert (edge_to_cancel->dest != loop->header);
> +
> + /* Check that it leads to loop latch. */
> + if (edge_to_cancel->dest != loop->latch)
> + continue;
> +
> + VEC_free (edge, heap, exits);
> +
> + /* Verify that the code in loop latch does nothing that may end program
> + execution without really reaching the exit. This may include
> + non-pure/const function calls, EH statements, volatile ASMs etc. */
> + for (gsi = gsi_start_bb (loop->latch); !gsi_end_p (gsi); gsi_next (&gsi))
> + if (gimple_has_side_effects (gsi_stmt (gsi)))
> + return NULL;
> + return edge_to_cancel;
> + }
> + VEC_free (edge, heap, exits);
> + return NULL;
> +}
> +
> /* Tries to unroll LOOP completely, i.e. NITER times.
> UL determines which loops we are allowed to unroll.
> - EXIT is the exit of the loop that should be eliminated. */
> + EXIT is the exit of the loop that should be eliminated.
> + IRRED_INVALIDATED is used to bookkeep if information about
> + irreducible regions may become invalid as a result
> + of the transformation. */
>
> static bool
> try_unroll_loop_completely (struct loop *loop,
> edge exit, tree niter,
> - enum unroll_level ul)
> + enum unroll_level ul,
> + bool *irred_invalidated)
> {
> unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns;
> gimple cond;
> struct loop_size size;
> + bool n_unroll_found = false;
> + HOST_WIDE_INT maxiter;
> + basic_block latch;
> + edge latch_edge;
> + location_t locus;
> + int flags;
> + gimple stmt;
> + gimple_stmt_iterator gsi;
> + edge edge_to_cancel = NULL;
> + int num = loop->num;
>
> - if (loop->inner)
> - return false;
> + /* See if we proved number of iterations to be low constant.
>
> - if (!host_integerp (niter, 1))
> + EXIT is an edge that will be removed in all but last iteration of
> + the loop.
> +
> + EDGE_TO_CACNEL is an edge that will be removed from the last iteration
> + of the unrolled sequence and is expected to make the final loop not
> + rolling.
> +
> + If the number of execution of loop is determined by standard induction
> + variable test, then EXIT and EDGE_TO_CANCEL are the two edges leaving
> + from the iv test. */
> + if (host_integerp (niter, 1))
> + {
> + n_unroll = tree_low_cst (niter, 1);
> + n_unroll_found = true;
> + edge_to_cancel = EDGE_SUCC (exit->src, 0);
> + if (edge_to_cancel == exit)
> + edge_to_cancel = EDGE_SUCC (exit->src, 1);
> + }
> + /* We do not know the number of iterations and thus we can not eliminate
> + the EXIT edge. */
> + else
> + exit = NULL;
> +
> + /* See if we can improve our estimate by using recorded loop bounds. */
> + maxiter = max_loop_iterations_int (loop);
> + if (maxiter >= 0
> + && (!n_unroll_found || (unsigned HOST_WIDE_INT)maxiter < n_unroll))
> + {
> + n_unroll = maxiter;
> + n_unroll_found = true;
> + /* Loop terminates before the IV variable test, so we can not
> + remove it in the last iteration. */
> + edge_to_cancel = NULL;
> + }
> +
> + if (!n_unroll_found)
> return false;
> - n_unroll = tree_low_cst (niter, 1);
>
> max_unroll = PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES);
> if (n_unroll > max_unroll)
> return false;
>
> + if (!edge_to_cancel)
> + edge_to_cancel = loop_edge_to_cancel (loop);
> +
> if (n_unroll)
> {
> + sbitmap wont_exit;
> + edge e;
> + unsigned i;
> + VEC (edge, heap) *to_remove = NULL;
> if (ul == UL_SINGLE_ITER)
> return false;
>
> - tree_estimate_loop_size (loop, exit, &size);
> + tree_estimate_loop_size (loop, exit, edge_to_cancel, &size);
> ninsns = size.overall;
>
> unr_insns = estimated_unrolled_size (&size, n_unroll);
> @@ -354,6 +479,18 @@ try_unroll_loop_completely (struct loop
> (int) unr_insns);
> }
>
> + /* We unroll only inner loops, because we do not consider it profitable
> + otheriwse. We still can cancel loopback edge of not rolling loop;
> + this is always a good idea. */
> + if (loop->inner && unr_insns > ninsns)
> + {
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, "Not unrolling loop %d:"
> + "it is not innermost and code would grow.\n",
> + loop->num);
> + return false;
> + }
> +
> if (unr_insns > ninsns
> && (unr_insns
> > (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS)))
> @@ -369,17 +506,10 @@ try_unroll_loop_completely (struct loop
> && unr_insns > ninsns)
> {
> if (dump_file && (dump_flags & TDF_DETAILS))
> - fprintf (dump_file, "Not unrolling loop %d.\n", loop->num);
> + fprintf (dump_file, "Not unrolling loop %d: size would grow.\n",
> + loop->num);
> return false;
> }
> - }
> -
> - if (n_unroll)
> - {
> - sbitmap wont_exit;
> - edge e;
> - unsigned i;
> - VEC (edge, heap) *to_remove = NULL;
>
> initialize_original_copy_tables ();
> wont_exit = sbitmap_alloc (n_unroll + 1);
> @@ -408,15 +538,67 @@ try_unroll_loop_completely (struct loop
> free_original_copy_tables ();
> }
>
> - cond = last_stmt (exit->src);
> - if (exit->flags & EDGE_TRUE_VALUE)
> - gimple_cond_make_true (cond);
> - else
> - gimple_cond_make_false (cond);
> - update_stmt (cond);
> + /* Remove the conditional from the last copy of the loop. */
> + if (edge_to_cancel)
> + {
> + cond = last_stmt (edge_to_cancel->src);
> + if (edge_to_cancel->flags & EDGE_TRUE_VALUE)
> + gimple_cond_make_false (cond);
> + else
> + gimple_cond_make_true (cond);
> + update_stmt (cond);
> + /* Do not remove the path. Doing so may remove outer loop
> + and confuse bookkeeping code in tree_unroll_loops_completelly. */
> + }
> + /* We did not manage to cancel the loop.
> + The loop latch remains reachable even if it will never be reached
> + at runtime. We must redirect it to somewhere, so create basic
> + block containg __builtin_unreachable call for this reason. */
> + else
> + {
> + latch = loop->latch;
> + latch_edge = loop_latch_edge (loop);
> + flags = latch_edge->flags;
> + locus = latch_edge->goto_locus;
> +
> + /* Unloop destroys the latch edge. */
> + unloop (loop, irred_invalidated);
> +
> + /* Create new basic block for the latch edge destination and wire
> + it in. */
> + stmt = gimple_build_call (builtin_decl_implicit (BUILT_IN_UNREACHABLE), 0);
> + latch_edge = make_edge (latch, create_basic_block (NULL, NULL, latch), flags);
> + latch_edge->probability = 0;
> + latch_edge->count = 0;
> + latch_edge->flags |= flags;
> + latch_edge->goto_locus = locus;
> +
> + latch_edge->dest->loop_father = current_loops->tree_root;
> + latch_edge->dest->count = 0;
> + latch_edge->dest->frequency = 0;
> + set_immediate_dominator (CDI_DOMINATORS, latch_edge->dest, latch_edge->src);
> +
> + gsi = gsi_start_bb (latch_edge->dest);
> + gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
> + }
>
> if (dump_file && (dump_flags & TDF_DETAILS))
> - fprintf (dump_file, "Unrolled loop %d completely.\n", loop->num);
> + {
> + if (!n_unroll)
> + fprintf (dump_file, "Turned loop %d to non-loop; it never loops.\n",
> + num);
> + else
> + fprintf (dump_file, "Unrolled loop %d completely "
> + "(duplicated %i times).\n", num, (int)n_unroll);
> + if (exit)
> + fprintf (dump_file, "Exit condition of peeled iterations was "
> + "eliminated.\n");
> + if (edge_to_cancel)
> + fprintf (dump_file, "Last iteration exit edge was proved true.\n");
> + else
> + fprintf (dump_file, "Latch of last iteration was marked by "
> + "__builtin_unreachable ().\n");
> + }
>
> return true;
> }
> @@ -425,12 +608,15 @@ try_unroll_loop_completely (struct loop
> CREATE_IV is true if we may create a new iv. UL determines
> which loops we are allowed to completely unroll. If TRY_EVAL is true, we try
> to determine the number of iterations of a loop by direct evaluation.
> - Returns true if cfg is changed. */
> + Returns true if cfg is changed.
> +
> + IRRED_INVALIDATED is used to keep if irreducible reginos needs to be recomputed. */
>
> static bool
> canonicalize_loop_induction_variables (struct loop *loop,
> bool create_iv, enum unroll_level ul,
> - bool try_eval)
> + bool try_eval,
> + bool *irred_invalidated)
> {
> edge exit = NULL;
> tree niter;
> @@ -455,22 +641,34 @@ canonicalize_loop_induction_variables (s
> || TREE_CODE (niter) != INTEGER_CST))
> niter = find_loop_niter_by_eval (loop, &exit);
>
> - if (chrec_contains_undetermined (niter)
> - || TREE_CODE (niter) != INTEGER_CST)
> - return false;
> + if (TREE_CODE (niter) != INTEGER_CST)
> + exit = NULL;
> }
>
> - if (dump_file && (dump_flags & TDF_DETAILS))
> + /* We work exceptionally hard here to estimate the bound
> + by find_loop_niter_by_eval. Be sure to keep it for future. */
> + if (niter && TREE_CODE (niter) == INTEGER_CST)
> + record_niter_bound (loop, tree_to_double_int (niter), false, true);
> +
> + if (dump_file && (dump_flags & TDF_DETAILS)
> + && TREE_CODE (niter) == INTEGER_CST)
> {
> fprintf (dump_file, "Loop %d iterates ", loop->num);
> print_generic_expr (dump_file, niter, TDF_SLIM);
> fprintf (dump_file, " times.\n");
> }
> + if (dump_file && (dump_flags & TDF_DETAILS)
> + && max_loop_iterations_int (loop) >= 0)
> + {
> + fprintf (dump_file, "Loop %d iterates at most %i times.\n", loop->num,
> + (int)max_loop_iterations_int (loop));
> + }
>
> - if (try_unroll_loop_completely (loop, exit, niter, ul))
> + if (try_unroll_loop_completely (loop, exit, niter, ul, irred_invalidated))
> return true;
>
> - if (create_iv)
> + if (create_iv
> + && niter && !chrec_contains_undetermined (niter))
> create_canonical_iv (loop, exit, niter);
>
> return false;
> @@ -485,15 +683,21 @@ canonicalize_induction_variables (void)
> loop_iterator li;
> struct loop *loop;
> bool changed = false;
> + bool irred_invalidated = false;
>
> FOR_EACH_LOOP (li, loop, 0)
> {
> changed |= canonicalize_loop_induction_variables (loop,
> true, UL_SINGLE_ITER,
> - true);
> + true,
> + &irred_invalidated);
> }
> gcc_assert (!need_ssa_update_p (cfun));
>
> + if (irred_invalidated
> + && loops_state_satisfies_p (LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS))
> + mark_irreducible_loops ();
> +
> /* Clean up the information about numbers of iterations, since brute force
> evaluation could reveal new information. */
> scev_reset ();
> @@ -594,9 +798,10 @@ tree_unroll_loops_completely (bool may_i
>
> do
> {
> + bool irred_invalidated = false;
> changed = false;
>
> - FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
> + FOR_EACH_LOOP (li, loop, 0)
> {
> struct loop *loop_father = loop_outer (loop);
>
> @@ -609,7 +814,8 @@ tree_unroll_loops_completely (bool may_i
> ul = UL_NO_GROWTH;
>
> if (canonicalize_loop_induction_variables (loop, false, ul,
> - !flag_tree_loop_ivcanon))
> + !flag_tree_loop_ivcanon,
> + &irred_invalidated))
> {
> changed = true;
> /* If we'll continue unrolling, we need to propagate constants
> @@ -629,6 +835,10 @@ tree_unroll_loops_completely (bool may_i
> struct loop **iter;
> unsigned i;
>
> + if (irred_invalidated
> + && loops_state_satisfies_p (LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS))
> + mark_irreducible_loops ();
> +
> update_ssa (TODO_update_ssa);
>
> /* Propagate the constants within the new basic blocks. */
> Index: tree.c
> ===================================================================
> --- tree.c (revision 192483)
> +++ tree.c (working copy)
> @@ -9524,6 +9524,15 @@ build_common_builtin_nodes (void)
> tree tmp, ftype;
> int ecf_flags;
>
> + if (!builtin_decl_explicit_p (BUILT_IN_UNREACHABLE))
> + {
> + ftype = build_function_type (void_type_node, void_list_node);
> + local_define_builtin ("__builtin_unreachable", ftype, BUILT_IN_UNREACHABLE,
> + "__builtin_unreachable",
> + ECF_NOTHROW | ECF_LEAF | ECF_NORETURN
> + | ECF_CONST | ECF_LEAF);
> + }
> +
> if (!builtin_decl_explicit_p (BUILT_IN_MEMCPY)
> || !builtin_decl_explicit_p (BUILT_IN_MEMMOVE))
> {
> Index: cfgloop.h
> ===================================================================
> --- cfgloop.h (revision 192483)
> +++ cfgloop.h (working copy)
> @@ -320,7 +321,8 @@ extern struct loop *loopify (edge, edge,
> struct loop * loop_version (struct loop *, void *,
> basic_block *, unsigned, unsigned, unsigned, bool);
> extern bool remove_path (edge);
> -void scale_loop_frequencies (struct loop *, int, int);
> +extern void unloop (struct loop *, bool *);
> +extern void scale_loop_frequencies (struct loop *, int, int);
>
> /* Induction variable analysis. */
>
> Index: cfgloopmanip.c
> ===================================================================
> --- cfgloopmanip.c (revision 192483)
> +++ cfgloopmanip.c (working copy)
> @@ -37,7 +37,6 @@ static int find_path (edge, basic_block
> static void fix_loop_placements (struct loop *, bool *);
> static bool fix_bb_placement (basic_block);
> static void fix_bb_placements (basic_block, bool *);
> -static void unloop (struct loop *, bool *);
>
> /* Checks whether basic block BB is dominated by DATA. */
> static bool
> @@ -895,7 +894,7 @@ loopify (edge latch_edge, edge header_ed
> If this may cause the information about irreducible regions to become
> invalid, IRRED_INVALIDATED is set to true. */
>
> -static void
> +void
> unloop (struct loop *loop, bool *irred_invalidated)
> {
> basic_block *body;
> Index: testsuite/gcc.target/i386/l_fma_float_5.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_float_5.c (revision 192483)
> +++ testsuite/gcc.target/i386/l_fma_float_5.c (working copy)
> @@ -12,7 +12,7 @@
> /* { dg-final { scan-assembler-times "vfmsub132ps" 8 } } */
> /* { dg-final { scan-assembler-times "vfnmadd132ps" 8 } } */
> /* { dg-final { scan-assembler-times "vfnmsub132ps" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd132ss" 16 } } */
> -/* { dg-final { scan-assembler-times "vfmsub132ss" 16 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd132ss" 16 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub132ss" 16 } } */
> +/* { dg-final { scan-assembler-times "vfmadd132ss" 72 } } */
> +/* { dg-final { scan-assembler-times "vfmsub132ss" 72 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd132ss" 72 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub132ss" 72 } } */
> Index: testsuite/gcc.target/i386/l_fma_double_4.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_double_4.c (revision 192483)
> +++ testsuite/gcc.target/i386/l_fma_double_4.c (working copy)
> @@ -12,7 +12,7 @@
> /* { dg-final { scan-assembler-times "vfmsub132pd" 8 } } */
> /* { dg-final { scan-assembler-times "vfnmadd132pd" 8 } } */
> /* { dg-final { scan-assembler-times "vfnmsub132pd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd132sd" 16 } } */
> -/* { dg-final { scan-assembler-times "vfmsub132sd" 16 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd132sd" 16 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub132sd" 16 } } */
> +/* { dg-final { scan-assembler-times "vfmadd132sd" 40 } } */
> +/* { dg-final { scan-assembler-times "vfmsub132sd" 40 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd132sd" 40 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub132sd" 40 } } */
> Index: testsuite/gcc.target/i386/l_fma_float_2.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_float_2.c (revision 192483)
> +++ testsuite/gcc.target/i386/l_fma_float_2.c (working copy)
> @@ -12,7 +12,7 @@
> /* { dg-final { scan-assembler-times "vfmsub132ps" 8 } } */
> /* { dg-final { scan-assembler-times "vfnmadd132ps" 8 } } */
> /* { dg-final { scan-assembler-times "vfnmsub132ps" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd132ss" 16 } } */
> -/* { dg-final { scan-assembler-times "vfmsub132ss" 16 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd132ss" 16 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub132ss" 16 } } */
> +/* { dg-final { scan-assembler-times "vfmadd132ss" 72 } } */
> +/* { dg-final { scan-assembler-times "vfmsub132ss" 72 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd132ss" 72 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub132ss" 72 } } */
> Index: testsuite/gcc.target/i386/l_fma_float_6.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_float_6.c (revision 192483)
> +++ testsuite/gcc.target/i386/l_fma_float_6.c (working copy)
> @@ -12,7 +12,7 @@
> /* { dg-final { scan-assembler-times "vfmsub132ps" 8 } } */
> /* { dg-final { scan-assembler-times "vfnmadd132ps" 8 } } */
> /* { dg-final { scan-assembler-times "vfnmsub132ps" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd132ss" 16 } } */
> -/* { dg-final { scan-assembler-times "vfmsub132ss" 16 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd132ss" 16 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub132ss" 16 } } */
> +/* { dg-final { scan-assembler-times "vfmadd132ss" 72 } } */
> +/* { dg-final { scan-assembler-times "vfmsub132ss" 72 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd132ss" 72 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub132ss" 72 } } */
> Index: testsuite/gcc.target/i386/l_fma_double_1.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_double_1.c (revision 192483)
> +++ testsuite/gcc.target/i386/l_fma_double_1.c (working copy)
> @@ -16,11 +16,11 @@
> /* { dg-final { scan-assembler-times "vfnmadd231pd" 4 } } */
> /* { dg-final { scan-assembler-times "vfnmsub132pd" 4 } } */
> /* { dg-final { scan-assembler-times "vfnmsub231pd" 4 } } */
> -/* { dg-final { scan-assembler-times "vfmadd132sd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd213sd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmsub132sd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmsub213sd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd132sd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd213sd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub132sd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub213sd" 8 } } */
> +/* { dg-final { scan-assembler-times "vfmadd132sd" 20 } } */
> +/* { dg-final { scan-assembler-times "vfmadd213sd" 20 } } */
> +/* { dg-final { scan-assembler-times "vfmsub132sd" 20 } } */
> +/* { dg-final { scan-assembler-times "vfmsub213sd" 20 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd132sd" 20 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd213sd" 20 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub132sd" 20 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub213sd" 20 } } */
> Index: testsuite/gcc.target/i386/l_fma_double_5.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_double_5.c (revision 192483)
> +++ testsuite/gcc.target/i386/l_fma_double_5.c (working copy)
> @@ -12,7 +12,7 @@
> /* { dg-final { scan-assembler-times "vfmsub132pd" 8 } } */
> /* { dg-final { scan-assembler-times "vfnmadd132pd" 8 } } */
> /* { dg-final { scan-assembler-times "vfnmsub132pd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd132sd" 16 } } */
> -/* { dg-final { scan-assembler-times "vfmsub132sd" 16 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd132sd" 16 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub132sd" 16 } } */
> +/* { dg-final { scan-assembler-times "vfmadd132sd" 40 } } */
> +/* { dg-final { scan-assembler-times "vfmsub132sd" 40 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd132sd" 40 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub132sd" 40 } } */
> Index: testsuite/gcc.target/i386/l_fma_float_3.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_float_3.c (revision 192483)
> +++ testsuite/gcc.target/i386/l_fma_float_3.c (working copy)
> @@ -16,11 +16,11 @@
> /* { dg-final { scan-assembler-times "vfnmadd231ps" 4 } } */
> /* { dg-final { scan-assembler-times "vfnmsub132ps" 4 } } */
> /* { dg-final { scan-assembler-times "vfnmsub231ps" 4 } } */
> -/* { dg-final { scan-assembler-times "vfmadd132ss" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd213ss" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmsub132ss" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmsub213ss" 8 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd132ss" 8 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd213ss" 8 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub132ss" 8 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub213ss" 8 } } */
> +/* { dg-final { scan-assembler-times "vfmadd132ss" 36 } } */
> +/* { dg-final { scan-assembler-times "vfmadd213ss" 36 } } */
> +/* { dg-final { scan-assembler-times "vfmsub132ss" 36 } } */
> +/* { dg-final { scan-assembler-times "vfmsub213ss" 36 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd132ss" 36 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd213ss" 36 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub132ss" 36 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub213ss" 36 } } */
> Index: testsuite/gcc.target/i386/l_fma_double_2.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_double_2.c (revision 192483)
> +++ testsuite/gcc.target/i386/l_fma_double_2.c (working copy)
> @@ -12,7 +12,7 @@
> /* { dg-final { scan-assembler-times "vfmsub132pd" 8 } } */
> /* { dg-final { scan-assembler-times "vfnmadd132pd" 8 } } */
> /* { dg-final { scan-assembler-times "vfnmsub132pd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd132sd" 16 } } */
> -/* { dg-final { scan-assembler-times "vfmsub132sd" 16 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd132sd" 16 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub132sd" 16 } } */
> +/* { dg-final { scan-assembler-times "vfmadd132sd" 40 } } */
> +/* { dg-final { scan-assembler-times "vfmsub132sd" 40 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd132sd" 40 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub132sd" 40 } } */
> Index: testsuite/gcc.target/i386/l_fma_double_6.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_double_6.c (revision 192483)
> +++ testsuite/gcc.target/i386/l_fma_double_6.c (working copy)
> @@ -12,7 +12,7 @@
> /* { dg-final { scan-assembler-times "vfmsub132pd" 8 } } */
> /* { dg-final { scan-assembler-times "vfnmadd132pd" 8 } } */
> /* { dg-final { scan-assembler-times "vfnmsub132pd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd132sd" 16 } } */
> -/* { dg-final { scan-assembler-times "vfmsub132sd" 16 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd132sd" 16 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub132sd" 16 } } */
> +/* { dg-final { scan-assembler-times "vfmadd132sd" 40 } } */
> +/* { dg-final { scan-assembler-times "vfmsub132sd" 40 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd132sd" 40 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub132sd" 40 } } */
> Index: testsuite/gcc.target/i386/l_fma_float_4.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_float_4.c (revision 192483)
> +++ testsuite/gcc.target/i386/l_fma_float_4.c (working copy)
> @@ -12,7 +12,7 @@
> /* { dg-final { scan-assembler-times "vfmsub132ps" 8 } } */
> /* { dg-final { scan-assembler-times "vfnmadd132ps" 8 } } */
> /* { dg-final { scan-assembler-times "vfnmsub132ps" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd132ss" 16 } } */
> -/* { dg-final { scan-assembler-times "vfmsub132ss" 16 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd132ss" 16 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub132ss" 16 } } */
> +/* { dg-final { scan-assembler-times "vfmadd132ss" 72 } } */
> +/* { dg-final { scan-assembler-times "vfmsub132ss" 72 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd132ss" 72 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub132ss" 72 } } */
> Index: testsuite/gcc.target/i386/l_fma_double_3.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_double_3.c (revision 192483)
> +++ testsuite/gcc.target/i386/l_fma_double_3.c (working copy)
> @@ -16,11 +16,11 @@
> /* { dg-final { scan-assembler-times "vfnmadd231pd" 4 } } */
> /* { dg-final { scan-assembler-times "vfnmsub132pd" 4 } } */
> /* { dg-final { scan-assembler-times "vfnmsub231pd" 4 } } */
> -/* { dg-final { scan-assembler-times "vfmadd132sd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd213sd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmsub132sd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmsub213sd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd132sd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd213sd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub132sd" 8 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub213sd" 8 } } */
> +/* { dg-final { scan-assembler-times "vfmadd132sd" 20 } } */
> +/* { dg-final { scan-assembler-times "vfmadd213sd" 20 } } */
> +/* { dg-final { scan-assembler-times "vfmsub132sd" 20 } } */
> +/* { dg-final { scan-assembler-times "vfmsub213sd" 20 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd132sd" 20 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd213sd" 20 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub132sd" 20 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub213sd" 20 } } */
> Index: testsuite/gcc.target/i386/l_fma_float_1.c
> ===================================================================
> --- testsuite/gcc.target/i386/l_fma_float_1.c (revision 192483)
> +++ testsuite/gcc.target/i386/l_fma_float_1.c (working copy)
> @@ -16,11 +16,11 @@
> /* { dg-final { scan-assembler-times "vfnmadd231ps" 4 } } */
> /* { dg-final { scan-assembler-times "vfnmsub132ps" 4 } } */
> /* { dg-final { scan-assembler-times "vfnmsub231ps" 4 } } */
> -/* { dg-final { scan-assembler-times "vfmadd132ss" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmadd213ss" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmsub132ss" 8 } } */
> -/* { dg-final { scan-assembler-times "vfmsub213ss" 8 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd132ss" 8 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd213ss" 8 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub132ss" 8 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub213ss" 8 } } */
> +/* { dg-final { scan-assembler-times "vfmadd132ss" 36 } } */
> +/* { dg-final { scan-assembler-times "vfmadd213ss" 36 } } */
> +/* { dg-final { scan-assembler-times "vfmsub132ss" 36 } } */
> +/* { dg-final { scan-assembler-times "vfmsub213ss" 36 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd132ss" 36 } } */
> +/* { dg-final { scan-assembler-times "vfnmadd213ss" 36 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub132ss" 36 } } */
> +/* { dg-final { scan-assembler-times "vfnmsub213ss" 36 } } */
> Index: testsuite/gfortran.dg/do_1.f90
> ===================================================================
> --- testsuite/gfortran.dg/do_1.f90 (revision 192483)
> +++ testsuite/gfortran.dg/do_1.f90 (working copy)
> @@ -1,4 +1,5 @@
> -! { dg-do run }
> +! { dg-do run { xfail *-*-* } }
> +! XFAIL is tracked in PR 54932
> ! Program to check corner cases for DO statements.
> program do_1
> implicit none
> Index: testsuite/gcc.dg/tree-ssa/cunroll-1.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/cunroll-1.c (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/cunroll-1.c (revision 0)
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> +int a[2];
> +test(int c)
> +{
> + int i;
> + for (i=0;i<c;i++)
> + a[i]=5;
> +}
> +/* Array bounds says the loop will not roll much. */
> +/* { dg-final { scan-tree-dump "Unrolled loop 1 completely .duplicated 1 times.." "cunroll"} } */
> +/* { dg-final { scan-tree-dump "Last iteration exit edge was proved true." "cunroll"} } */
> +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> Index: testsuite/gcc.dg/tree-ssa/cunroll-2.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/cunroll-2.c (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/cunroll-2.c (revision 0)
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> +int a[2];
> +test(int c)
> +{
> + int i;
> + for (i=0;i<c;i++)
> + {
> + a[i]=5;
> + if (test2())
> + return;
> + }
> +}
> +/* We are not able to get rid of the final conditional because the loop has two exits. */
> +/* { dg-final { scan-tree-dump "Unrolled loop 1 completely .duplicated 2 times.." "cunroll"} } */
> +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> Index: testsuite/gcc.dg/tree-ssa/cunroll-3.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/cunroll-3.c (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/cunroll-3.c (revision 0)
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
> +int a[1];
> +test(int c)
> +{
> + int i;
> + for (i=0;i<c;i++)
> + {
> + a[i]=5;
> + }
> +}
> +/* If we start duplicating headers prior curoll, this loop will have 0 iterations. */
> +
> +/* { dg-final { scan-tree-dump "Unrolled loop 1 completely .duplicated 1 times.." "cunrolli"} } */
> +/* { dg-final { cleanup-tree-dump "cunrolli" } } */
> Index: testsuite/gcc.dg/tree-ssa/cunroll-4.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/cunroll-4.c (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/cunroll-4.c (revision 0)
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> +int a[1];
> +test(int c)
> +{
> + int i=0,j;
> + for (i=0;i<c;i++)
> + {
> + for (j=0;j<c;j++)
> + {
> + a[i]=5;
> + test2();
> + }
> + }
> +}
> +
> +/* We should do this as part of cunrolli, but our cost model do not take into account early exit
> + from the last iteration. */
> +/* { dg-final { scan-tree-dump "Turned loop 1 to non-loop; it never loops." "cunrolli"} } */
> +/* { dg-final { scan-tree-dump "Last iteration exit edge was proved true." "cunrolli"} } */
> +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> Index: testsuite/gcc.dg/tree-ssa/cunroll-5.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/cunroll-5.c (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/cunroll-5.c (revision 0)
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> +int *a;
> +test(int c)
> +{
> + int i;
> + for (i=0;i<6;i++)
> + a[i]=5;
> +}
> +/* Basic testcase for complette unrolling. */
> +/* { dg-final { scan-tree-dump "Unrolled loop 1 completely .duplicated 5 times.." "cunroll"} } */
> +/* { dg-final { scan-tree-dump "Exit condition of peeled iterations was eliminated." "cunroll"} } */
> +/* { dg-final { scan-tree-dump "Last iteration exit edge was proved true." "cunroll"} } */
> +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> Index: testsuite/gcc.dg/tree-ssa/ldist-17.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/ldist-17.c (revision 192483)
> +++ testsuite/gcc.dg/tree-ssa/ldist-17.c (working copy)
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns -fdump-tree-ldist-details" } */
> +/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns -fdump-tree-ldist-details fdisable-tree-cunroll -fdisable-tree-cunrolli" } */
>
> typedef int mad_fixed_t;
> struct mad_pcm
>
>
--
Richard Biener <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend
More information about the Gcc-patches
mailing list