This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Enable loop peeling at -O3
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: gcc-patches at gcc dot gnu dot org, rguenther at suse dot de
- Date: Fri, 27 May 2016 15:19:29 +0200
- Subject: Enable loop peeling at -O3
- Authentication-results: sourceware.org; auth=none
Hi,
this patch enabled -fpeel-loops by default at -O3 and makes it to use likely
upper bound estimates. The patch also adds -fpeel-all-loops flag that is
symmetric to -funroll-all-loops. Long time ago we used to interpret
-fpeel-loops this way and blindly peel every loop but this behaviour got lost
and now we only peel loop we have some evidence for.
Bootstrapped/regtested x86_64-linux, I am retesting after last minute change
(adding of the testcase). OK?
Honza
* common.opt (flag_peel_all_loops): New option.
* doc/invoke.texi: (-fpeel-loops): Update documentation.
(-fpeel-all-loops): Document.
* opts.c (default_options): Add OPT_fpeel_loops to -O3+.
* toplev.c (process_options): flag_peel_all_loops implies
flag_peel_loops.
* tree-ssa-lop-ivcanon.c (try_peel_loop): Update comment; handle
-fpeel-all-loops, use likely estimates.
* gcc.dg/tree-ssa/peel1.c: New testcase.
* gcc.dg/tree-ssa/peel2.c: New testcase.
Index: common.opt
===================================================================
--- common.opt (revision 236815)
+++ common.opt (working copy)
@@ -1840,6 +1840,10 @@ fpeel-loops
Common Report Var(flag_peel_loops) Optimization
Perform loop peeling.
+fpeel-all-loops
+Common Report Var(flag_peel_all_loops) Optimization
+Perform loop peeling of all loops.
+
fpeephole
Common Report Var(flag_no_peephole,0) Optimization
Enable machine specific peephole optimizations.
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi (revision 236815)
+++ doc/invoke.texi (working copy)
@@ -8661,10 +8661,17 @@ the loop is entered. This usually makes
@item -fpeel-loops
@opindex fpeel-loops
Peels loops for which there is enough information that they do not
-roll much (from profile feedback). It also turns on complete loop peeling
-(i.e.@: complete removal of loops with small constant number of iterations).
+roll much (from profile feedback or static analysis). It also turns on
+complete loop peeling (i.e.@: complete removal of loops with small constant
+number of iterations).
-Enabled with @option{-fprofile-use}.
+Enabled with @option{-O3} and @option{-fprofile-use}.
+
+@item -fpeel-all-loops
+@opindex fpeel-all-loops
+Peel all loops, even if their number of iterations is uncertain when
+the loop is entered. For loops with large number of iterations this leads
+to wasted code size.
@item -fmove-loop-invariants
@opindex fmove-loop-invariants
Index: opts.c
===================================================================
--- opts.c (revision 236815)
+++ opts.c (working copy)
@@ -535,6 +535,7 @@ static const struct default_options defa
{ OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_DYNAMIC },
{ OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 },
{ OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
+ { OPT_LEVELS_3_PLUS, OPT_fpeel_loops, NULL, 1 },
/* -Ofast adds optimizations to -O3. */
{ OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 },
Index: testsuite/gcc.dg/tree-ssa/peel1.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/peel1.c (revision 0)
+++ testsuite/gcc.dg/tree-ssa/peel1.c (working copy)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-loop-ivcanon" } */
+struct foo {int b; int a[3];} foo;
+void add(struct foo *a,int l)
+{
+ int i;
+ for (i=0;i<l;i++)
+ a->a[i]++;
+}
+/* { dg-final { scan-tree-dump "Loop likely 1 iterates at most 3 times." 1 "ivcanon"} } */
+/* { dg-final { scan-tree-dump "Peeled loop 1, 4 times." 1 "ivcanon"} } */
Index: testsuite/gcc.dg/tree-ssa/peel2.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/peel2.c (revision 0)
+++ testsuite/gcc.dg/tree-ssa/peel2.c (working copy)
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fpeel-all-loops -fdump-tree-loop-ivcanon" } */
+void add(int *a,int l)
+{
+ int i;
+ for (i=0;i<l;i++)
+ a[i]++;
+}
+/* { dg-final { scan-tree-dump "Loop likely 1 iterates at most 3 times." 1 "ivcanon"} } */
+/* { dg-final { scan-tree-dump "Peeled loop 1, 4 times." 1 "ivcanon"} } */
Index: toplev.c
===================================================================
--- toplev.c (revision 236815)
+++ toplev.c (working copy)
@@ -1294,6 +1294,9 @@ process_options (void)
if (flag_unroll_all_loops)
flag_unroll_loops = 1;
+ if (flag_peel_all_loops)
+ flag_peel_loops = 1;
+
/* web and rename-registers help when run after loop unrolling. */
if (flag_web == AUTODETECT_VALUE)
flag_web = flag_unroll_loops || flag_peel_loops;
Index: tree-ssa-loop-ivcanon.c
===================================================================
--- tree-ssa-loop-ivcanon.c (revision 236816)
+++ tree-ssa-loop-ivcanon.c (working copy)
@@ -951,7 +951,9 @@ try_peel_loop (struct loop *loop,
if (!flag_peel_loops || PARAM_VALUE (PARAM_MAX_PEEL_TIMES) <= 0)
return false;
- /* Peel only innermost loops. */
+ /* Peel only innermost loops.
+ While the code is perfectly capable of peeling non-innermost loops,
+ the heuristics would probably need some improvements. */
if (loop->inner)
{
if (dump_file)
@@ -969,12 +971,16 @@ try_peel_loop (struct loop *loop,
/* Check if there is an estimate on the number of iterations. */
npeel = estimated_loop_iterations_int (loop);
if (npeel < 0)
+ npeel = likely_max_loop_iterations_int (loop);
+ if (npeel < 0 && flag_peel_all_loops)
+ npeel = PARAM_VALUE (PARAM_MAX_PEEL_TIMES) - 1;
+ if (npeel < 0)
{
if (dump_file)
fprintf (dump_file, "Not peeling: number of iterations is not "
"estimated\n");
return false;
}
if (maxiter >= 0 && maxiter <= npeel)
{
if (dump_file)