[PATCH, Loop optimizer]: Add logic to disable certain loop optimizations on pre-/post-loops

Fang, Changpeng Changpeng.Fang@amd.com
Tue Dec 14 17:33:00 GMT 2010


No, I didn't see these failures.

The failure list in my bootstrapping is the following. I see nothing relevant:

FAIL: gcc.dg/guality/pr43077-1.c  -O2 -flto -flto-partition=none  line 42 varb == 2
FAIL: gcc.dg/guality/pr43077-1.c  -O2 -flto  line 42 varb == 2
FAIL: gcc.dg/guality/sra-1.c  -O1  line 21 a.j == 14
FAIL: gcc.dg/guality/sra-1.c  -O2  line 21 a.j == 14
FAIL: gcc.dg/guality/sra-1.c  -O3 -fomit-frame-pointer  line 21 a.j == 14
FAIL: gcc.dg/guality/sra-1.c  -O3 -g  line 21 a.j == 14
FAIL: gcc.dg/guality/sra-1.c  -Os  line 21 a.j == 14
FAIL: gcc.dg/guality/vla-1.c  -O0  line 17 sizeof (a) == 6
FAIL: gcc.dg/guality/vla-1.c  -O0  line 24 sizeof (a) == 17 * sizeof (short)
FAIL: gcc.dg/guality/vla-1.c  -O1  line 24 sizeof (a) == 17 * sizeof (short)
FAIL: gcc.dg/guality/vla-1.c  -O2  line 24 sizeof (a) == 17 * sizeof (short)
FAIL: gcc.dg/guality/vla-1.c  -O3 -fomit-frame-pointer  line 24 sizeof (a) == 17 * sizeof (short)
FAIL: gcc.dg/guality/vla-1.c  -O3 -g  line 24 sizeof (a) == 17 * sizeof (short)
FAIL: gcc.dg/guality/vla-1.c  -Os  line 24 sizeof (a) == 17 * sizeof (short)
FAIL: gcc.dg/guality/vla-2.c  -O0  line 16 sizeof (a) == 5 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c  -O0  line 25 sizeof (a) == 6 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c  -O1  line 16 sizeof (a) == 5 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c  -O1  line 25 sizeof (a) == 6 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c  -O2  line 16 sizeof (a) == 5 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c  -O2  line 25 sizeof (a) == 6 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c  -O3 -fomit-frame-pointer  line 16 sizeof (a) == 5 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c  -O3 -fomit-frame-pointer  line 25 sizeof (a) == 6 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c  -O3 -g  line 16 sizeof (a) == 5 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c  -O3 -g  line 25 sizeof (a) == 6 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c  -Os  line 16 sizeof (a) == 5 * sizeof (int)
FAIL: gcc.dg/guality/vla-2.c  -Os  line 25 sizeof (a) == 6 * sizeof (int)
FAIL: g++.dg/guality/redeclaration1.C  -O0  line 17 i == 24
FAIL: g++.dg/guality/redeclaration1.C  -O1  line 17 i == 24
FAIL: g++.dg/guality/redeclaration1.C  -O2  line 17 i == 24
FAIL: g++.dg/guality/redeclaration1.C  -O3 -fomit-frame-pointer  line 17 i == 24
FAIL: g++.dg/guality/redeclaration1.C  -O3 -g  line 17 i == 24
FAIL: g++.dg/guality/redeclaration1.C  -Os  line 17 i == 24
FAIL: libmudflap.c/pass49-frag.c execution test
FAIL: libmudflap.c/pass49-frag.c output pattern test
FAIL: libmudflap.c/pass49-frag.c execution test
FAIL: libmudflap.c/pass49-frag.c output pattern test
FAIL: libmudflap.c/pass49-frag.c (-static) execution test
FAIL: libmudflap.c/pass49-frag.c (-static) output pattern test
FAIL: libmudflap.c/pass49-frag.c (-static) execution test
FAIL: libmudflap.c/pass49-frag.c (-static) output pattern test
FAIL: libmudflap.c/pass49-frag.c (-O2) execution test
FAIL: libmudflap.c/pass49-frag.c (-O2) output pattern test
FAIL: libmudflap.c/pass49-frag.c (-O2) execution test
FAIL: libmudflap.c/pass49-frag.c (-O2) output pattern test
FAIL: libmudflap.c/pass49-frag.c (-O3) execution test
FAIL: libmudflap.c/pass49-frag.c (-O3) output pattern test
FAIL: libmudflap.c/pass49-frag.c (-O3) execution test
FAIL: libmudflap.c/pass49-frag.c (-O3) output pattern test
FAIL: gcc.dg/cproj-fails-with-broken-glibc.c execution test


Thanks,

Changpeng




________________________________________
From: Jack Howarth [howarth@bromo.med.uc.edu]
Sent: Tuesday, December 14, 2010 8:27 AM
To: Fang, Changpeng
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, Loop optimizer]: Add logic to disable certain loop optimizations on pre-/post-loops

On Mon, Dec 13, 2010 at 02:35:35PM -0600, Fang, Changpeng wrote:
> Hi,
>
> The attached patch adds the logic to disable certain loop optimizations on pre-/post-loops.
>
> Some loop optimizations (auto-vectorization, loop unrolling, etc) may peel a few iterations
> of a loop to form pre- and/or post-loops for various purposes (alignment, loop bounds, etc).
> Currently, GCC loop optimizer is unable to recognize that such loops will roll only a few
> iterations and still perform optimizations on them. While this does not hurt the performance in general,
> it may significantly increase the compilation time and code size without performance benefit.
>
> This patch adds such logic for the loop optimizer to recognize pre- and/or post loops, and disable
> prefetch, unswitch and loop unrolling on them. On polyhedron with -Ofast -funroll-loops -march=amdfam10,
> the patch could reduce the compilation time by 28% on average, the reduce the binary size by 20% on
>  average (see the atached data).  Note that the small improvement (0.5%) could have been noise, the
> code size reduction could possibly improve the performance in some cases (I_cache iprovement?).
>
> The patch passed bootstrap and gcc regression tests on x86_64-unknown-linux-gnu.
>
> Is it OK to commit to trunk?
>
> Thanks,
>
> Changpeng

Changpeng,
   On x86_64-apple-darwin10, this patch produces some regressions in the gcc testsuite.
In particular at both -m32 and -m64...

XPASS: gcc.dg/pr30957-1.c execution test
FAIL: gcc.dg/pr30957-1.c scan-rtl-dump loop2_unroll "Expanding Accumulator"

and

FAIL: gcc.dg/var-expand1.c scan-rtl-dump loop2_unroll "Expanding Accumulator"

Do you see those as well on linux?
               Jack

>
Content-Description: polyhedron.txt
> Effact of the pre-/post-loop patch on polyhedron
> option: gfortran -Ofast -funroll-loops -march=amdfam10
>
>                compilation     code size      speed
>               time reduction   reduction      improvement
>                  (%)             (%)            (%)
> -----------------------------------------------------------
>           ac  -20.54          -17.15          0
>       aermod  -15.93          -10.15          2.51
>          air  -5.74           -5.45           -0.09
>     capacita  -31.35          -18.27          0.08
>      channel  -11.32          -10.24          1.22
>        doduc  -4.52           -6.12           0.82
>      fatigue  -34.51          -15.94          0
>      gas_dyn  -45.56          -28.66          2.31
>       induct  -3.1            -1.91           0.05
>        linpk  -25.55          -27.5           0.26
>         mdbx  -24.06          -19.74          1.27
>           nf  -60.85          -48.92          -0.77
>      protein  -44.73          -24.02          -0.19
>       rnflow  -50.55          -36.69          0.47
>     test_fpu  -52.49          -41.35          1.18
>         tfft  -24.83          -18.29          0.39
> -----------------------------------------------------------
>      average  -28.48          -20.65          0.59
>

Content-Description: 0001-Don-t-perform-certain-loop-optimizations-on-pre-post.patch
> From e8636e80de4d6de8ba2dbc8f08bd2daddd02edc3 Mon Sep 17 00:00:00 2001
> From: Changpeng Fang <chfang@houghton.(none)>
> Date: Mon, 13 Dec 2010 12:01:49 -0800
> Subject: [PATCH] Don't perform certain loop optimizations on pre/post loops
>
>       * basic-block.h (bb_flags): Add a new flag BB_PRE_POST_LOOP_HEADER.
>       * cfg.c (clear_bb_flags): Keep BB_PRE_POST_LOOP_HEADER marker.
>       * cfgloop.h (mark_pre_or_post_loop): New function declaration.
>         (pre_or_post_loop_p): New function declaration.
>       * loop-unroll.c (decide_unroll_runtime_iterations): Do not unroll a
>         pre- or post-loop.
>       * loop-unswitch.c (unswitch_single_loop): Do not unswitch a pre- or
>         post-loop.
>       * tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Mark the
>         post-loop.
>       * tree-ssa-loop-niter.c (mark_pre_or_post_loop): Implement the new
>         function.  (pre_or_post_loop_p): Implement the new function.
>       * tree-ssa-loop-prefetch.c (loop_prefetch_arrays): Don't prefetch
>         a pre- or post-loop.
>       * tree-ssa-loop-unswitch.c (tree_ssa_unswitch_loops): Do not unswitch
>         a pre- or post-loop.
>       * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Mark the
>         post-loop.  (vect_do_peeling_for_alignment): Mark the pre-loop.
> ---
>  gcc/basic-block.h            |    6 +++++-
>  gcc/cfg.c                    |    7 ++++---
>  gcc/cfgloop.h                |    2 ++
>  gcc/loop-unroll.c            |    7 +++++++
>  gcc/loop-unswitch.c          |    8 ++++++++
>  gcc/tree-ssa-loop-manip.c    |    3 +++
>  gcc/tree-ssa-loop-niter.c    |   20 ++++++++++++++++++++
>  gcc/tree-ssa-loop-prefetch.c |    7 +++++++
>  gcc/tree-ssa-loop-unswitch.c |    8 ++++++++
>  gcc/tree-vect-loop-manip.c   |    8 ++++++++
>  10 files changed, 72 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/basic-block.h b/gcc/basic-block.h
> index be0a1d1..78552fd 100644
> --- a/gcc/basic-block.h
> +++ b/gcc/basic-block.h
> @@ -245,7 +245,11 @@ enum bb_flags
>
>    /* Set on blocks that cannot be threaded through.
>       Only used in cfgcleanup.c.  */
> -  BB_NONTHREADABLE_BLOCK = 1 << 11
> +  BB_NONTHREADABLE_BLOCK = 1 << 11,
> +
> +  /* Set on blocks that are headers of pre- or post-loops.  */
> +  BB_PRE_POST_LOOP_HEADER = 1 << 12
> +
>  };
>
>  /* Dummy flag for convenience in the hot/cold partitioning code.  */
> diff --git a/gcc/cfg.c b/gcc/cfg.c
> index c8ef799..e9b394a 100644
> --- a/gcc/cfg.c
> +++ b/gcc/cfg.c
> @@ -425,8 +425,8 @@ redirect_edge_pred (edge e, basic_block new_pred)
>    connect_src (e);
>  }
>
> -/* Clear all basic block flags, with the exception of partitioning and
> -   setjmp_target.  */
> +/* Clear all basic block flags, with the exception of partitioning,
> +   setjmp_target, and the pre/post loop marker.  */
>  void
>  clear_bb_flags (void)
>  {
> @@ -434,7 +434,8 @@ clear_bb_flags (void)
>
>    FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
>      bb->flags = (BB_PARTITION (bb)
> -              | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET)));
> +              | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET
> +                                 + BB_PRE_POST_LOOP_HEADER)));
>  }
>
>  /* Check the consistency of profile information.  We can't do that
> diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
> index bf2614e..ce848cc 100644
> --- a/gcc/cfgloop.h
> +++ b/gcc/cfgloop.h
> @@ -279,6 +279,8 @@ extern rtx doloop_condition_get (rtx);
>  void estimate_numbers_of_iterations_loop (struct loop *, bool);
>  HOST_WIDE_INT estimated_loop_iterations_int (struct loop *, bool);
>  bool estimated_loop_iterations (struct loop *, bool, double_int *);
> +void mark_pre_or_post_loop (struct loop *);
> +bool pre_or_post_loop_p (struct loop *);
>
>  /* Loop manipulation.  */
>  extern bool can_duplicate_loop_p (const struct loop *loop);
> diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
> index 67d6ea0..6f095f6 100644
> --- a/gcc/loop-unroll.c
> +++ b/gcc/loop-unroll.c
> @@ -857,6 +857,13 @@ decide_unroll_runtime_iterations (struct loop *loop, int flags)
>       fprintf (dump_file, ";; Loop iterates constant times\n");
>        return;
>      }
> +
> +  if (pre_or_post_loop_p (loop))
> +    {
> +      if (dump_file)
> +        fprintf (dump_file, ";; Not unrolling, a pre- or post-loop\n");
> +      return;
> +    }
>
>    /* If we have profile feedback, check whether the loop rolls.  */
>    if (loop->header->count && expected_loop_iterations (loop) < 2 * nunroll)
> diff --git a/gcc/loop-unswitch.c b/gcc/loop-unswitch.c
> index 77524d8..59373bf 100644
> --- a/gcc/loop-unswitch.c
> +++ b/gcc/loop-unswitch.c
> @@ -276,6 +276,14 @@ unswitch_single_loop (struct loop *loop, rtx cond_checked, int num)
>        return;
>      }
>
> +  /* Pre- or post loop usually just roll a few iterations.  */
> +  if (pre_or_post_loop_p (loop))
> +    {
> +      if (dump_file)
> +     fprintf (dump_file, ";; Not unswitching, a pre- or post loop\n");
> +      return;
> +    }
> +
>    /* We must be able to duplicate loop body.  */
>    if (!can_duplicate_loop_p (loop))
>      {
> diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
> index 87b2c0d..f8ddbab 100644
> --- a/gcc/tree-ssa-loop-manip.c
> +++ b/gcc/tree-ssa-loop-manip.c
> @@ -931,6 +931,9 @@ tree_transform_and_unroll_loop (struct loop *loop, unsigned factor,
>    gcc_assert (new_loop != NULL);
>    update_ssa (TODO_update_ssa);
>
> +  /* NEW_LOOP is a post-loop.  */
> +  mark_pre_or_post_loop (new_loop);
> +
>    /* Determine the probability of the exit edge of the unrolled loop.  */
>    new_est_niter = est_niter / factor;
>
> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
> index ee85f6f..33e8cc3 100644
> --- a/gcc/tree-ssa-loop-niter.c
> +++ b/gcc/tree-ssa-loop-niter.c
> @@ -3011,6 +3011,26 @@ estimate_numbers_of_iterations (bool use_undefined_p)
>    fold_undefer_and_ignore_overflow_warnings ();
>  }
>
> +/* Mark LOOP as a pre- or post loop.  */
> +
> +void
> +mark_pre_or_post_loop (struct loop *loop)
> +{
> +  gcc_assert (loop && loop->header);
> +  loop->header->flags |= BB_PRE_POST_LOOP_HEADER;
> +}
> +
> +/* Return true if LOOP is a pre- or post loop.  */
> +
> +bool
> +pre_or_post_loop_p (struct loop *loop)
> +{
> +  int masked_flags;
> +  gcc_assert (loop && loop->header);
> +  masked_flags = (loop->header->flags & BB_PRE_POST_LOOP_HEADER);
> +  return (masked_flags != 0);
> +}
> +
>  /* Returns true if statement S1 dominates statement S2.  */
>
>  bool
> diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
> index 59c65d3..5c9f640 100644
> --- a/gcc/tree-ssa-loop-prefetch.c
> +++ b/gcc/tree-ssa-loop-prefetch.c
> @@ -1793,6 +1793,13 @@ loop_prefetch_arrays (struct loop *loop)
>        return false;
>      }
>
> +  if (pre_or_post_loop_p (loop))
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +     fprintf (dump_file, "  Not Prefetching -- pre- or post loop\n");
> +      return false;
> +    }
> +
>    /* FIXME: the time should be weighted by the probabilities of the blocks in
>       the loop body.  */
>    time = tree_num_loop_insns (loop, &eni_time_weights);
> diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c
> index b6b32dc..f3b8108 100644
> --- a/gcc/tree-ssa-loop-unswitch.c
> +++ b/gcc/tree-ssa-loop-unswitch.c
> @@ -88,6 +88,14 @@ tree_ssa_unswitch_loops (void)
>        if (dump_file && (dump_flags & TDF_DETAILS))
>          fprintf (dump_file, ";; Considering loop %d\n", loop->num);
>
> +     /* Do not unswitch a pre- or post loop.  */
> +     if (pre_or_post_loop_p (loop))
> +       {
> +          if (dump_file && (dump_flags & TDF_DETAILS))
> +            fprintf (dump_file, ";; Not unswitching, a pre- or post loop\n");
> +          continue;
> +        }
> +
>        /* Do not unswitch in cold regions. */
>        if (optimize_loop_for_size_p (loop))
>          {
> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> index 6ecd304..9a63f7e 100644
> --- a/gcc/tree-vect-loop-manip.c
> +++ b/gcc/tree-vect-loop-manip.c
> @@ -1938,6 +1938,10 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio,
>                                           cond_expr, cond_expr_stmt_list);
>    gcc_assert (new_loop);
>    gcc_assert (loop_num == loop->num);
> +
> +  /* NEW_LOOP is a post loop.  */
> +  mark_pre_or_post_loop (new_loop);
> +
>  #ifdef ENABLE_CHECKING
>    slpeel_verify_cfg_after_peeling (loop, new_loop);
>  #endif
> @@ -2191,6 +2195,10 @@ vect_do_peeling_for_alignment (loop_vec_info loop_vinfo)
>                                  th, true, NULL_TREE, NULL);
>
>    gcc_assert (new_loop);
> +
> +  /* NEW_LOOP is a pre-loop.  */
> +  mark_pre_or_post_loop (new_loop);
> +
>  #ifdef ENABLE_CHECKING
>    slpeel_verify_cfg_after_peeling (new_loop, loop);
>  #endif
> --
> 1.6.3.3
>





More information about the Gcc-patches mailing list