This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, Loop optimizer]: Add logic to disable certain loop optimizations on pre-/post-loops


On Mon, Dec 13, 2010 at 02:35:35PM -0600, Fang, Changpeng wrote:
> Hi,
> 
> The attached patch adds the logic to disable certain loop optimizations on pre-/post-loops. 
> 
> Some loop optimizations (auto-vectorization, loop unrolling, etc) may peel a few iterations
> of a loop to form pre- and/or post-loops for various purposes (alignment, loop bounds, etc).
> Currently, GCC loop optimizer is unable to recognize that such loops will roll only a few 
> iterations and still perform optimizations on them. While this does not hurt the performance in general,
> it may significantly increase the compilation time and code size without performance benefit.
> 
> This patch adds such logic for the loop optimizer to recognize pre- and/or post loops, and disable
> prefetch, unswitch and loop unrolling on them. On polyhedron with -Ofast -funroll-loops -march=amdfam10,
> the patch could reduce the compilation time by 28% on average, the reduce the binary size by 20% on
>  average (see the atached data).  Note that the small improvement (0.5%) could have been noise, the
> code size reduction could possibly improve the performance in some cases (I_cache iprovement?).
> 
> The patch passed bootstrap and gcc regression tests on x86_64-unknown-linux-gnu.
> 
> Is it OK to commit to trunk?
> 
> Thanks,
> 
> Changpeng

Changpeng,
   On x86_64-apple-darwin10, this patch produces some regressions in the gcc testsuite.
In particular at both -m32 and -m64...

XPASS: gcc.dg/pr30957-1.c execution test
FAIL: gcc.dg/pr30957-1.c scan-rtl-dump loop2_unroll "Expanding Accumulator"

and

FAIL: gcc.dg/var-expand1.c scan-rtl-dump loop2_unroll "Expanding Accumulator"

Do you see those as well on linux?
               Jack

>    
Content-Description: polyhedron.txt
> Effact of the pre-/post-loop patch on polyhedron 
> option: gfortran -Ofast -funroll-loops -march=amdfam10
> 
>                compilation     code size      speed
>               time reduction   reduction      improvement
>                  (%)             (%)            (%)
> -----------------------------------------------------------
>           ac	-20.54		-17.15		0
>       aermod	-15.93		-10.15		2.51
>          air	-5.74		-5.45		-0.09
>     capacita	-31.35		-18.27		0.08
>      channel	-11.32		-10.24		1.22
>        doduc	-4.52		-6.12		0.82
>      fatigue	-34.51		-15.94		0
>      gas_dyn	-45.56		-28.66		2.31
>       induct	-3.1		-1.91		0.05
>        linpk	-25.55		-27.5		0.26
>         mdbx	-24.06		-19.74		1.27
>           nf	-60.85		-48.92		-0.77
>      protein	-44.73		-24.02		-0.19
>       rnflow	-50.55		-36.69		0.47
>     test_fpu	-52.49		-41.35		1.18
>         tfft	-24.83		-18.29		0.39
> -----------------------------------------------------------
>      average	-28.48		-20.65		0.59
> 

Content-Description: 0001-Don-t-perform-certain-loop-optimizations-on-pre-post.patch
> From e8636e80de4d6de8ba2dbc8f08bd2daddd02edc3 Mon Sep 17 00:00:00 2001
> From: Changpeng Fang <chfang@houghton.(none)>
> Date: Mon, 13 Dec 2010 12:01:49 -0800
> Subject: [PATCH] Don't perform certain loop optimizations on pre/post loops
> 
> 	* basic-block.h (bb_flags): Add a new flag BB_PRE_POST_LOOP_HEADER.
> 	* cfg.c (clear_bb_flags): Keep BB_PRE_POST_LOOP_HEADER marker.
> 	* cfgloop.h (mark_pre_or_post_loop): New function declaration.
> 	  (pre_or_post_loop_p): New function declaration.
> 	* loop-unroll.c (decide_unroll_runtime_iterations): Do not unroll a
> 	  pre- or post-loop.
> 	* loop-unswitch.c (unswitch_single_loop): Do not unswitch a pre- or
> 	  post-loop.
> 	* tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Mark the
> 	  post-loop.
> 	* tree-ssa-loop-niter.c (mark_pre_or_post_loop): Implement the new
> 	  function.  (pre_or_post_loop_p): Implement the new function.
> 	* tree-ssa-loop-prefetch.c (loop_prefetch_arrays): Don't prefetch
> 	  a pre- or post-loop.
> 	* tree-ssa-loop-unswitch.c (tree_ssa_unswitch_loops): Do not unswitch
> 	  a pre- or post-loop.
> 	* tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Mark the
> 	  post-loop.  (vect_do_peeling_for_alignment): Mark the pre-loop.
> ---
>  gcc/basic-block.h            |    6 +++++-
>  gcc/cfg.c                    |    7 ++++---
>  gcc/cfgloop.h                |    2 ++
>  gcc/loop-unroll.c            |    7 +++++++
>  gcc/loop-unswitch.c          |    8 ++++++++
>  gcc/tree-ssa-loop-manip.c    |    3 +++
>  gcc/tree-ssa-loop-niter.c    |   20 ++++++++++++++++++++
>  gcc/tree-ssa-loop-prefetch.c |    7 +++++++
>  gcc/tree-ssa-loop-unswitch.c |    8 ++++++++
>  gcc/tree-vect-loop-manip.c   |    8 ++++++++
>  10 files changed, 72 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/basic-block.h b/gcc/basic-block.h
> index be0a1d1..78552fd 100644
> --- a/gcc/basic-block.h
> +++ b/gcc/basic-block.h
> @@ -245,7 +245,11 @@ enum bb_flags
>  
>    /* Set on blocks that cannot be threaded through.
>       Only used in cfgcleanup.c.  */
> -  BB_NONTHREADABLE_BLOCK = 1 << 11
> +  BB_NONTHREADABLE_BLOCK = 1 << 11,
> +
> +  /* Set on blocks that are headers of pre- or post-loops.  */
> +  BB_PRE_POST_LOOP_HEADER = 1 << 12
> +
>  };
>  
>  /* Dummy flag for convenience in the hot/cold partitioning code.  */
> diff --git a/gcc/cfg.c b/gcc/cfg.c
> index c8ef799..e9b394a 100644
> --- a/gcc/cfg.c
> +++ b/gcc/cfg.c
> @@ -425,8 +425,8 @@ redirect_edge_pred (edge e, basic_block new_pred)
>    connect_src (e);
>  }
>  
> -/* Clear all basic block flags, with the exception of partitioning and
> -   setjmp_target.  */
> +/* Clear all basic block flags, with the exception of partitioning,
> +   setjmp_target, and the pre/post loop marker.  */
>  void
>  clear_bb_flags (void)
>  {
> @@ -434,7 +434,8 @@ clear_bb_flags (void)
>  
>    FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
>      bb->flags = (BB_PARTITION (bb)
> -		 | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET)));
> +		 | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET
> +                                 + BB_PRE_POST_LOOP_HEADER)));
>  }
>  
>  /* Check the consistency of profile information.  We can't do that
> diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
> index bf2614e..ce848cc 100644
> --- a/gcc/cfgloop.h
> +++ b/gcc/cfgloop.h
> @@ -279,6 +279,8 @@ extern rtx doloop_condition_get (rtx);
>  void estimate_numbers_of_iterations_loop (struct loop *, bool);
>  HOST_WIDE_INT estimated_loop_iterations_int (struct loop *, bool);
>  bool estimated_loop_iterations (struct loop *, bool, double_int *);
> +void mark_pre_or_post_loop (struct loop *);
> +bool pre_or_post_loop_p (struct loop *);
>  
>  /* Loop manipulation.  */
>  extern bool can_duplicate_loop_p (const struct loop *loop);
> diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
> index 67d6ea0..6f095f6 100644
> --- a/gcc/loop-unroll.c
> +++ b/gcc/loop-unroll.c
> @@ -857,6 +857,13 @@ decide_unroll_runtime_iterations (struct loop *loop, int flags)
>  	fprintf (dump_file, ";; Loop iterates constant times\n");
>        return;
>      }
> + 
> +  if (pre_or_post_loop_p (loop))
> +    {
> +      if (dump_file)
> +        fprintf (dump_file, ";; Not unrolling, a pre- or post-loop\n");
> +      return;
> +    }
>  
>    /* If we have profile feedback, check whether the loop rolls.  */
>    if (loop->header->count && expected_loop_iterations (loop) < 2 * nunroll)
> diff --git a/gcc/loop-unswitch.c b/gcc/loop-unswitch.c
> index 77524d8..59373bf 100644
> --- a/gcc/loop-unswitch.c
> +++ b/gcc/loop-unswitch.c
> @@ -276,6 +276,14 @@ unswitch_single_loop (struct loop *loop, rtx cond_checked, int num)
>        return;
>      }
>  
> +  /* Pre- or post loop usually just roll a few iterations.  */
> +  if (pre_or_post_loop_p (loop))
> +    {
> +      if (dump_file)
> +	fprintf (dump_file, ";; Not unswitching, a pre- or post loop\n");
> +      return;
> +    }
> +
>    /* We must be able to duplicate loop body.  */
>    if (!can_duplicate_loop_p (loop))
>      {
> diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
> index 87b2c0d..f8ddbab 100644
> --- a/gcc/tree-ssa-loop-manip.c
> +++ b/gcc/tree-ssa-loop-manip.c
> @@ -931,6 +931,9 @@ tree_transform_and_unroll_loop (struct loop *loop, unsigned factor,
>    gcc_assert (new_loop != NULL);
>    update_ssa (TODO_update_ssa);
>  
> +  /* NEW_LOOP is a post-loop.  */
> +  mark_pre_or_post_loop (new_loop);
> +
>    /* Determine the probability of the exit edge of the unrolled loop.  */
>    new_est_niter = est_niter / factor;
>  
> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
> index ee85f6f..33e8cc3 100644
> --- a/gcc/tree-ssa-loop-niter.c
> +++ b/gcc/tree-ssa-loop-niter.c
> @@ -3011,6 +3011,26 @@ estimate_numbers_of_iterations (bool use_undefined_p)
>    fold_undefer_and_ignore_overflow_warnings ();
>  }
>  
> +/* Mark LOOP as a pre- or post loop.  */
> +
> +void
> +mark_pre_or_post_loop (struct loop *loop)
> +{
> +  gcc_assert (loop && loop->header);
> +  loop->header->flags |= BB_PRE_POST_LOOP_HEADER;
> +}
> +
> +/* Return true if LOOP is a pre- or post loop.  */
> +
> +bool
> +pre_or_post_loop_p (struct loop *loop)
> +{
> +  int masked_flags;
> +  gcc_assert (loop && loop->header);
> +  masked_flags = (loop->header->flags & BB_PRE_POST_LOOP_HEADER);
> +  return (masked_flags != 0);
> +}
> +
>  /* Returns true if statement S1 dominates statement S2.  */
>  
>  bool
> diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
> index 59c65d3..5c9f640 100644
> --- a/gcc/tree-ssa-loop-prefetch.c
> +++ b/gcc/tree-ssa-loop-prefetch.c
> @@ -1793,6 +1793,13 @@ loop_prefetch_arrays (struct loop *loop)
>        return false;
>      }
>  
> +  if (pre_or_post_loop_p (loop))
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +	fprintf (dump_file, "  Not Prefetching -- pre- or post loop\n");
> +      return false;
> +    }
> +
>    /* FIXME: the time should be weighted by the probabilities of the blocks in
>       the loop body.  */
>    time = tree_num_loop_insns (loop, &eni_time_weights);
> diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c
> index b6b32dc..f3b8108 100644
> --- a/gcc/tree-ssa-loop-unswitch.c
> +++ b/gcc/tree-ssa-loop-unswitch.c
> @@ -88,6 +88,14 @@ tree_ssa_unswitch_loops (void)
>        if (dump_file && (dump_flags & TDF_DETAILS))
>          fprintf (dump_file, ";; Considering loop %d\n", loop->num);
>  
> +     /* Do not unswitch a pre- or post loop.  */
> +     if (pre_or_post_loop_p (loop))
> +       {
> +          if (dump_file && (dump_flags & TDF_DETAILS))
> +            fprintf (dump_file, ";; Not unswitching, a pre- or post loop\n");
> +          continue;
> +        }
> +
>        /* Do not unswitch in cold regions. */
>        if (optimize_loop_for_size_p (loop))
>          {
> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> index 6ecd304..9a63f7e 100644
> --- a/gcc/tree-vect-loop-manip.c
> +++ b/gcc/tree-vect-loop-manip.c
> @@ -1938,6 +1938,10 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio,
>  					    cond_expr, cond_expr_stmt_list);
>    gcc_assert (new_loop);
>    gcc_assert (loop_num == loop->num);
> +  
> +  /* NEW_LOOP is a post loop.  */
> +  mark_pre_or_post_loop (new_loop);
> +
>  #ifdef ENABLE_CHECKING
>    slpeel_verify_cfg_after_peeling (loop, new_loop);
>  #endif
> @@ -2191,6 +2195,10 @@ vect_do_peeling_for_alignment (loop_vec_info loop_vinfo)
>  				   th, true, NULL_TREE, NULL);
>  
>    gcc_assert (new_loop);
> +
> +  /* NEW_LOOP is a pre-loop.  */
> +  mark_pre_or_post_loop (new_loop);
> +  
>  #ifdef ENABLE_CHECKING
>    slpeel_verify_cfg_after_peeling (new_loop, loop);
>  #endif
> -- 
> 1.6.3.3
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]