[PATCH, Loop optimizer]: Add logic to disable certain loop optimizations on pre-/post-loops
Fang, Changpeng
Changpeng.Fang@amd.com
Tue Dec 14 17:33:00 GMT 2010
No, I didn't see these failures.
The failure list in my bootstrapping is the following. I see nothing relevant:
FAIL: gcc.dg/guality/pr43077-1.c -O2 -flto -flto-partition=none line 42 varb == 2
FAIL: gcc.dg/guality/pr43077-1.c -O2 -flto line 42 varb == 2
FAIL: gcc.dg/guality/sra-1.c -O1 line 21 a.j == 14
FAIL: gcc.dg/guality/sra-1.c -O2 line 21 a.j == 14
FAIL: gcc.dg/guality/sra-1.c -O3 -fomit-frame-pointer line 21 a.j == 14
FAIL: gcc.dg/guality/sra-1.c -O3 -g line 21 a.j == 14
FAIL: gcc.dg/guality/sra-1.c -Os line 21 a.j == 14
FAIL: gcc.dg/guality/vla-1.c -O0 line 17 sizeof (a) == 6
FAIL: gcc.dg/guality/vla-1.c -O0 line 24 sizeof (a) == 17 * sizeof (short)
FAIL: gcc.dg/guality/vla-1.c -O1 line 24 sizeof (a) == 17 * sizeof (short)
FAIL: gcc.dg/guality/vla-1.c -O2 line 24 sizeof (a) == 17 * sizeof (short)
FAIL: gcc.dg/guality/vla-1.c -O3 -fomit-frame-pointer line 24 sizeof (a) == 17 * sizeof (short)
FAIL: gcc.dg/guality/vla-1.c -O3 -g line 24 sizeof (a) == 17 * sizeof (short)
FAIL: gcc.dg/guality/vla-1.c -Os line 24 sizeof (a) == 17 * sizeof (short)
FAIL: gcc.dg/guality/vla-2.c -O0 line 16 sizeof (a) == 5 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c -O0 line 25 sizeof (a) == 6 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c -O1 line 16 sizeof (a) == 5 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c -O1 line 25 sizeof (a) == 6 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c -O2 line 16 sizeof (a) == 5 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c -O2 line 25 sizeof (a) == 6 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c -O3 -fomit-frame-pointer line 16 sizeof (a) == 5 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c -O3 -fomit-frame-pointer line 25 sizeof (a) == 6 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c -O3 -g line 16 sizeof (a) == 5 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c -O3 -g line 25 sizeof (a) == 6 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c -Os line 16 sizeof (a) == 5 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c -Os line 25 sizeof (a) == 6 * sizeof (int)
FAIL: g++.dg/guality/redeclaration1.C -O0 line 17 i == 24
FAIL: g++.dg/guality/redeclaration1.C -O1 line 17 i == 24
FAIL: g++.dg/guality/redeclaration1.C -O2 line 17 i == 24
FAIL: g++.dg/guality/redeclaration1.C -O3 -fomit-frame-pointer line 17 i == 24
FAIL: g++.dg/guality/redeclaration1.C -O3 -g line 17 i == 24
FAIL: g++.dg/guality/redeclaration1.C -Os line 17 i == 24
FAIL: libmudflap.c/pass49-frag.c execution test
FAIL: libmudflap.c/pass49-frag.c output pattern test
FAIL: libmudflap.c/pass49-frag.c execution test
FAIL: libmudflap.c/pass49-frag.c output pattern test
FAIL: libmudflap.c/pass49-frag.c (-static) execution test
FAIL: libmudflap.c/pass49-frag.c (-static) output pattern test
FAIL: libmudflap.c/pass49-frag.c (-static) execution test
FAIL: libmudflap.c/pass49-frag.c (-static) output pattern test
FAIL: libmudflap.c/pass49-frag.c (-O2) execution test
FAIL: libmudflap.c/pass49-frag.c (-O2) output pattern test
FAIL: libmudflap.c/pass49-frag.c (-O2) execution test
FAIL: libmudflap.c/pass49-frag.c (-O2) output pattern test
FAIL: libmudflap.c/pass49-frag.c (-O3) execution test
FAIL: libmudflap.c/pass49-frag.c (-O3) output pattern test
FAIL: libmudflap.c/pass49-frag.c (-O3) execution test
FAIL: libmudflap.c/pass49-frag.c (-O3) output pattern test
FAIL: gcc.dg/cproj-fails-with-broken-glibc.c execution test
Thanks,
Changpeng
________________________________________
From: Jack Howarth [howarth@bromo.med.uc.edu]
Sent: Tuesday, December 14, 2010 8:27 AM
To: Fang, Changpeng
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, Loop optimizer]: Add logic to disable certain loop optimizations on pre-/post-loops
On Mon, Dec 13, 2010 at 02:35:35PM -0600, Fang, Changpeng wrote:
> Hi,
>
> The attached patch adds the logic to disable certain loop optimizations on pre-/post-loops.
>
> Some loop optimizations (auto-vectorization, loop unrolling, etc) may peel a few iterations
> of a loop to form pre- and/or post-loops for various purposes (alignment, loop bounds, etc).
> Currently, GCC loop optimizer is unable to recognize that such loops will roll only a few
> iterations and still perform optimizations on them. While this does not hurt the performance in general,
> it may significantly increase the compilation time and code size without performance benefit.
>
> This patch adds such logic for the loop optimizer to recognize pre- and/or post loops, and disable
> prefetch, unswitch and loop unrolling on them. On polyhedron with -Ofast -funroll-loops -march=amdfam10,
> the patch could reduce the compilation time by 28% on average, the reduce the binary size by 20% on
> average (see the atached data). Note that the small improvement (0.5%) could have been noise, the
> code size reduction could possibly improve the performance in some cases (I_cache iprovement?).
>
> The patch passed bootstrap and gcc regression tests on x86_64-unknown-linux-gnu.
>
> Is it OK to commit to trunk?
>
> Thanks,
>
> Changpeng
Changpeng,
On x86_64-apple-darwin10, this patch produces some regressions in the gcc testsuite.
In particular at both -m32 and -m64...
XPASS: gcc.dg/pr30957-1.c execution test
FAIL: gcc.dg/pr30957-1.c scan-rtl-dump loop2_unroll "Expanding Accumulator"
and
FAIL: gcc.dg/var-expand1.c scan-rtl-dump loop2_unroll "Expanding Accumulator"
Do you see those as well on linux?
Jack
>
Content-Description: polyhedron.txt
> Effact of the pre-/post-loop patch on polyhedron
> option: gfortran -Ofast -funroll-loops -march=amdfam10
>
> compilation code size speed
> time reduction reduction improvement
> (%) (%) (%)
> -----------------------------------------------------------
> ac -20.54 -17.15 0
> aermod -15.93 -10.15 2.51
> air -5.74 -5.45 -0.09
> capacita -31.35 -18.27 0.08
> channel -11.32 -10.24 1.22
> doduc -4.52 -6.12 0.82
> fatigue -34.51 -15.94 0
> gas_dyn -45.56 -28.66 2.31
> induct -3.1 -1.91 0.05
> linpk -25.55 -27.5 0.26
> mdbx -24.06 -19.74 1.27
> nf -60.85 -48.92 -0.77
> protein -44.73 -24.02 -0.19
> rnflow -50.55 -36.69 0.47
> test_fpu -52.49 -41.35 1.18
> tfft -24.83 -18.29 0.39
> -----------------------------------------------------------
> average -28.48 -20.65 0.59
>
Content-Description: 0001-Don-t-perform-certain-loop-optimizations-on-pre-post.patch
> From e8636e80de4d6de8ba2dbc8f08bd2daddd02edc3 Mon Sep 17 00:00:00 2001
> From: Changpeng Fang <chfang@houghton.(none)>
> Date: Mon, 13 Dec 2010 12:01:49 -0800
> Subject: [PATCH] Don't perform certain loop optimizations on pre/post loops
>
> * basic-block.h (bb_flags): Add a new flag BB_PRE_POST_LOOP_HEADER.
> * cfg.c (clear_bb_flags): Keep BB_PRE_POST_LOOP_HEADER marker.
> * cfgloop.h (mark_pre_or_post_loop): New function declaration.
> (pre_or_post_loop_p): New function declaration.
> * loop-unroll.c (decide_unroll_runtime_iterations): Do not unroll a
> pre- or post-loop.
> * loop-unswitch.c (unswitch_single_loop): Do not unswitch a pre- or
> post-loop.
> * tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Mark the
> post-loop.
> * tree-ssa-loop-niter.c (mark_pre_or_post_loop): Implement the new
> function. (pre_or_post_loop_p): Implement the new function.
> * tree-ssa-loop-prefetch.c (loop_prefetch_arrays): Don't prefetch
> a pre- or post-loop.
> * tree-ssa-loop-unswitch.c (tree_ssa_unswitch_loops): Do not unswitch
> a pre- or post-loop.
> * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Mark the
> post-loop. (vect_do_peeling_for_alignment): Mark the pre-loop.
> ---
> gcc/basic-block.h | 6 +++++-
> gcc/cfg.c | 7 ++++---
> gcc/cfgloop.h | 2 ++
> gcc/loop-unroll.c | 7 +++++++
> gcc/loop-unswitch.c | 8 ++++++++
> gcc/tree-ssa-loop-manip.c | 3 +++
> gcc/tree-ssa-loop-niter.c | 20 ++++++++++++++++++++
> gcc/tree-ssa-loop-prefetch.c | 7 +++++++
> gcc/tree-ssa-loop-unswitch.c | 8 ++++++++
> gcc/tree-vect-loop-manip.c | 8 ++++++++
> 10 files changed, 72 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/basic-block.h b/gcc/basic-block.h
> index be0a1d1..78552fd 100644
> --- a/gcc/basic-block.h
> +++ b/gcc/basic-block.h
> @@ -245,7 +245,11 @@ enum bb_flags
>
> /* Set on blocks that cannot be threaded through.
> Only used in cfgcleanup.c. */
> - BB_NONTHREADABLE_BLOCK = 1 << 11
> + BB_NONTHREADABLE_BLOCK = 1 << 11,
> +
> + /* Set on blocks that are headers of pre- or post-loops. */
> + BB_PRE_POST_LOOP_HEADER = 1 << 12
> +
> };
>
> /* Dummy flag for convenience in the hot/cold partitioning code. */
> diff --git a/gcc/cfg.c b/gcc/cfg.c
> index c8ef799..e9b394a 100644
> --- a/gcc/cfg.c
> +++ b/gcc/cfg.c
> @@ -425,8 +425,8 @@ redirect_edge_pred (edge e, basic_block new_pred)
> connect_src (e);
> }
>
> -/* Clear all basic block flags, with the exception of partitioning and
> - setjmp_target. */
> +/* Clear all basic block flags, with the exception of partitioning,
> + setjmp_target, and the pre/post loop marker. */
> void
> clear_bb_flags (void)
> {
> @@ -434,7 +434,8 @@ clear_bb_flags (void)
>
> FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
> bb->flags = (BB_PARTITION (bb)
> - | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET)));
> + | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET
> + + BB_PRE_POST_LOOP_HEADER)));
> }
>
> /* Check the consistency of profile information. We can't do that
> diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
> index bf2614e..ce848cc 100644
> --- a/gcc/cfgloop.h
> +++ b/gcc/cfgloop.h
> @@ -279,6 +279,8 @@ extern rtx doloop_condition_get (rtx);
> void estimate_numbers_of_iterations_loop (struct loop *, bool);
> HOST_WIDE_INT estimated_loop_iterations_int (struct loop *, bool);
> bool estimated_loop_iterations (struct loop *, bool, double_int *);
> +void mark_pre_or_post_loop (struct loop *);
> +bool pre_or_post_loop_p (struct loop *);
>
> /* Loop manipulation. */
> extern bool can_duplicate_loop_p (const struct loop *loop);
> diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
> index 67d6ea0..6f095f6 100644
> --- a/gcc/loop-unroll.c
> +++ b/gcc/loop-unroll.c
> @@ -857,6 +857,13 @@ decide_unroll_runtime_iterations (struct loop *loop, int flags)
> fprintf (dump_file, ";; Loop iterates constant times\n");
> return;
> }
> +
> + if (pre_or_post_loop_p (loop))
> + {
> + if (dump_file)
> + fprintf (dump_file, ";; Not unrolling, a pre- or post-loop\n");
> + return;
> + }
>
> /* If we have profile feedback, check whether the loop rolls. */
> if (loop->header->count && expected_loop_iterations (loop) < 2 * nunroll)
> diff --git a/gcc/loop-unswitch.c b/gcc/loop-unswitch.c
> index 77524d8..59373bf 100644
> --- a/gcc/loop-unswitch.c
> +++ b/gcc/loop-unswitch.c
> @@ -276,6 +276,14 @@ unswitch_single_loop (struct loop *loop, rtx cond_checked, int num)
> return;
> }
>
> + /* Pre- or post loop usually just roll a few iterations. */
> + if (pre_or_post_loop_p (loop))
> + {
> + if (dump_file)
> + fprintf (dump_file, ";; Not unswitching, a pre- or post loop\n");
> + return;
> + }
> +
> /* We must be able to duplicate loop body. */
> if (!can_duplicate_loop_p (loop))
> {
> diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
> index 87b2c0d..f8ddbab 100644
> --- a/gcc/tree-ssa-loop-manip.c
> +++ b/gcc/tree-ssa-loop-manip.c
> @@ -931,6 +931,9 @@ tree_transform_and_unroll_loop (struct loop *loop, unsigned factor,
> gcc_assert (new_loop != NULL);
> update_ssa (TODO_update_ssa);
>
> + /* NEW_LOOP is a post-loop. */
> + mark_pre_or_post_loop (new_loop);
> +
> /* Determine the probability of the exit edge of the unrolled loop. */
> new_est_niter = est_niter / factor;
>
> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
> index ee85f6f..33e8cc3 100644
> --- a/gcc/tree-ssa-loop-niter.c
> +++ b/gcc/tree-ssa-loop-niter.c
> @@ -3011,6 +3011,26 @@ estimate_numbers_of_iterations (bool use_undefined_p)
> fold_undefer_and_ignore_overflow_warnings ();
> }
>
> +/* Mark LOOP as a pre- or post loop. */
> +
> +void
> +mark_pre_or_post_loop (struct loop *loop)
> +{
> + gcc_assert (loop && loop->header);
> + loop->header->flags |= BB_PRE_POST_LOOP_HEADER;
> +}
> +
> +/* Return true if LOOP is a pre- or post loop. */
> +
> +bool
> +pre_or_post_loop_p (struct loop *loop)
> +{
> + int masked_flags;
> + gcc_assert (loop && loop->header);
> + masked_flags = (loop->header->flags & BB_PRE_POST_LOOP_HEADER);
> + return (masked_flags != 0);
> +}
> +
> /* Returns true if statement S1 dominates statement S2. */
>
> bool
> diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
> index 59c65d3..5c9f640 100644
> --- a/gcc/tree-ssa-loop-prefetch.c
> +++ b/gcc/tree-ssa-loop-prefetch.c
> @@ -1793,6 +1793,13 @@ loop_prefetch_arrays (struct loop *loop)
> return false;
> }
>
> + if (pre_or_post_loop_p (loop))
> + {
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, " Not Prefetching -- pre- or post loop\n");
> + return false;
> + }
> +
> /* FIXME: the time should be weighted by the probabilities of the blocks in
> the loop body. */
> time = tree_num_loop_insns (loop, &eni_time_weights);
> diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c
> index b6b32dc..f3b8108 100644
> --- a/gcc/tree-ssa-loop-unswitch.c
> +++ b/gcc/tree-ssa-loop-unswitch.c
> @@ -88,6 +88,14 @@ tree_ssa_unswitch_loops (void)
> if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file, ";; Considering loop %d\n", loop->num);
>
> + /* Do not unswitch a pre- or post loop. */
> + if (pre_or_post_loop_p (loop))
> + {
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, ";; Not unswitching, a pre- or post loop\n");
> + continue;
> + }
> +
> /* Do not unswitch in cold regions. */
> if (optimize_loop_for_size_p (loop))
> {
> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> index 6ecd304..9a63f7e 100644
> --- a/gcc/tree-vect-loop-manip.c
> +++ b/gcc/tree-vect-loop-manip.c
> @@ -1938,6 +1938,10 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio,
> cond_expr, cond_expr_stmt_list);
> gcc_assert (new_loop);
> gcc_assert (loop_num == loop->num);
> +
> + /* NEW_LOOP is a post loop. */
> + mark_pre_or_post_loop (new_loop);
> +
> #ifdef ENABLE_CHECKING
> slpeel_verify_cfg_after_peeling (loop, new_loop);
> #endif
> @@ -2191,6 +2195,10 @@ vect_do_peeling_for_alignment (loop_vec_info loop_vinfo)
> th, true, NULL_TREE, NULL);
>
> gcc_assert (new_loop);
> +
> + /* NEW_LOOP is a pre-loop. */
> + mark_pre_or_post_loop (new_loop);
> +
> #ifdef ENABLE_CHECKING
> slpeel_verify_cfg_after_peeling (new_loop, loop);
> #endif
> --
> 1.6.3.3
>
More information about the Gcc-patches
mailing list