This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [PATCH, Loop optimizer]: Add logic to disable certain loop optimizations on pre-/post-loops


Hi,

Thanks, Jack. Hopefully the 6% protein gain (for -m64) is not just noise. 

I updated the patch based on the current trunk (REV 168477).

Is it OK to commit now?

Thanks,

Changpeng



________________________________________
From: Jack Howarth [howarth@bromo.med.uc.edu]
Sent: Monday, January 03, 2011 9:04 PM
To: Fang, Changpeng
Cc: Zdenek Dvorak; Richard Guenther; Xinliang David Li; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, Loop optimizer]: Add logic to disable certain loop optimizations on pre-/post-loops

On Fri, Dec 17, 2010 at 01:14:49AM -0600, Fang, Changpeng wrote:
> Hi, Jack:
>
> Thanks for the testing.
>
> This patch is not supposed to slow down a program by 10% (rnflow and test_fpu).
> It would be helpful if you can provide analysis why they are slowed down.

Changpeng,
   The corrected merge against gcc trunk of...

Index: gcc/basic-block.h
===================================================================
--- gcc/basic-block.h   (revision 168437)
+++ gcc/basic-block.h   (working copy)
@@ -247,11 +247,14 @@
      Only used in cfgcleanup.c.  */
   BB_NONTHREADABLE_BLOCK = 1 << 11,

+  /* Set on blocks that are headers of non-rolling loops.  */
+  BB_HEADER_OF_NONROLLING_LOOP = 1 << 12,
+
   /* Set on blocks that were modified in some way.  This bit is set in
      df_set_bb_dirty, but not cleared by df_analyze, so it can be used
      to test whether a block has been modified prior to a df_analyze
      call.  */
-  BB_MODIFIED = 1 << 12
+  BB_MODIFIED = 1 << 13
 };

 /* Dummy flag for convenience in the hot/cold partitioning code.  */

for the proposed patch from http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01344.html
eliminated the performance regressions on x86_64-apple-darwin10. I now get...

Compile Command : gfortran -ffast-math -funroll-loops -O3 %n.f90 -o %n

                   Execution Time
-m32
                 stock       patched  %increase
ac               10.59       10.59      0.0
aermod           19.49       19.13     -1.8
air               6.07        6.07      0.0
capacita         44.60       44.61      0.0
channel           1.98        1.98      0.0
doduc            31.19       31.31      0.4
fatigue           9.90       10.29      3.9
gas_dyn           4.72        4.71     -0.2
induct           13.93       13.93      0.0
linpk            15.50       15.49     -0.1
mdbx             11.28       11.26     -0.2
nf               27.62       27.58     -0.1
protein          38.70       38.60     -0.3
rnflow           24.68       24.68      0.0
test_fpu         10.13       10.13      0.0
tfft              1.92        1.92      0.0

Geometric Mean   12.06       12.08      0.2
Execution Time

-m64
                 stock       patched   %increase
ac                8.80        8.80      0.0
aermod           17.34       17.17     -1.0
air               5.48        5.52      0.7
capacita         32.38       32.50      0.4
channel           1.84        1.84      0.0
doduc            26.50       26.52      0.1
fatigue           8.35        8.33     -0.2
gas_dyn           4.30        4.29     -0.2
induct           12.83       12.83      0.0
linpk            15.49       15.49      0.0
mdbx             11.23       11.22     -0.1
nf               30.21       30.16     -0.2
protein          34.13       32.07     -6.0
rnflow           23.18       23.19      0.0
test_fpu          8.04        8.02     -0.2
tfft              1.87        1.86     -0.5

Geometric Mean   10.87       10.82     -0.5
Execution Time
From 7d10fcfe379e3fe4387cf192fbbdbea7daff0637 Mon Sep 17 00:00:00 2001
From: Changpeng Fang <chfang@houghton.(none)>
Date: Tue, 4 Jan 2011 14:36:50 -0800
Subject: [PATCH] Consider a loop not hot if it rolls only a few times (non-rolling)

	* basic-block.h (bb_flags): Add a new flag BB_HEADER_OF_NONROLLING_LOOP.
	* cfg.c (clear_bb_flags): Keep BB_HEADER_OF_NONROLLING marker.
	* cfgloop.h (mark_non_rolling_loop): New function declaration.
	  (non_rolling_loop_p): New function declaration.
	* predict.c (optimize_loop_for_size_p): Return true if the loop was marked
	  NON-ROLLING.  (optimize_loop_for_speed_p): Return false if the loop was
	  marked NON-ROLLING.
	* tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Mark the
	  non-rolling loop.
	* tree-ssa-loop-niter.c (mark_non_rolling_loop): Implement the new
	  function.  (non_rolling_loop_p): Implement the new function.
	* tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Mark the
	  non-rolling loop.  (vect_do_peeling_for_alignment): Mark the non-rolling
	  loop.
---
 gcc/basic-block.h          |    5 ++++-
 gcc/cfg.c                  |    7 ++++---
 gcc/cfgloop.h              |    2 ++
 gcc/predict.c              |    6 ++++++
 gcc/tree-ssa-loop-manip.c  |    3 +++
 gcc/tree-ssa-loop-niter.c  |   20 ++++++++++++++++++++
 gcc/tree-vect-loop-manip.c |    8 ++++++++
 7 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/gcc/basic-block.h b/gcc/basic-block.h
index 3594eea..081c175 100644
--- a/gcc/basic-block.h
+++ b/gcc/basic-block.h
@@ -251,7 +251,10 @@ enum bb_flags
      df_set_bb_dirty, but not cleared by df_analyze, so it can be used
      to test whether a block has been modified prior to a df_analyze
      call.  */
-  BB_MODIFIED = 1 << 12
+  BB_MODIFIED = 1 << 12,
+
+ /* Set on blocks that are headers of non-rolling loops.  */
+  BB_HEADER_OF_NONROLLING_LOOP = 1 << 13
 };
 
 /* Dummy flag for convenience in the hot/cold partitioning code.  */
diff --git a/gcc/cfg.c b/gcc/cfg.c
index c8ef799..e59a637 100644
--- a/gcc/cfg.c
+++ b/gcc/cfg.c
@@ -425,8 +425,8 @@ redirect_edge_pred (edge e, basic_block new_pred)
   connect_src (e);
 }
 
-/* Clear all basic block flags, with the exception of partitioning and
-   setjmp_target.  */
+/* Clear all basic block flags, with the exception of partitioning,
+   setjmp_target, and the non-rolling loop marker.  */
 void
 clear_bb_flags (void)
 {
@@ -434,7 +434,8 @@ clear_bb_flags (void)
 
   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
     bb->flags = (BB_PARTITION (bb)
-		 | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET)));
+		 | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET
+				 + BB_HEADER_OF_NONROLLING_LOOP)));
 }
 
 /* Check the consistency of profile information.  We can't do that
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index f7bb134..0f48115 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -279,6 +279,8 @@ extern rtx doloop_condition_get (rtx);
 void estimate_numbers_of_iterations_loop (struct loop *, bool);
 HOST_WIDE_INT estimated_loop_iterations_int (struct loop *, bool);
 bool estimated_loop_iterations (struct loop *, bool, double_int *);
+void mark_non_rolling_loop (struct loop *);
+bool non_rolling_loop_p (struct loop *);
 
 /* Loop manipulation.  */
 extern bool can_duplicate_loop_p (const struct loop *loop);
diff --git a/gcc/predict.c b/gcc/predict.c
index a86708a..34d7dff 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -280,6 +280,9 @@ optimize_insn_for_speed_p (void)
 bool
 optimize_loop_for_size_p (struct loop *loop)
 {
+  /* Loops marked NON-ROLLING are not likely to be hot.  */
+  if (non_rolling_loop_p (loop))
+    return true;
   return optimize_bb_for_size_p (loop->header);
 }
 
@@ -288,6 +291,9 @@ optimize_loop_for_size_p (struct loop *loop)
 bool
 optimize_loop_for_speed_p (struct loop *loop)
 {
+  /* Loops marked NON-ROLLING are not likely to be hot.  */
+  if (non_rolling_loop_p (loop))
+    return false;
   return optimize_bb_for_speed_p (loop->header);
 }
 
diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index 8176ed8..3ff0790 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -931,6 +931,9 @@ tree_transform_and_unroll_loop (struct loop *loop, unsigned factor,
   gcc_assert (new_loop != NULL);
   update_ssa (TODO_update_ssa);
 
+  /* NEW_LOOP is a non-rolling loop.  */
+  mark_non_rolling_loop (new_loop);
+
   /* Determine the probability of the exit edge of the unrolled loop.  */
   new_est_niter = est_niter / factor;
 
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index ee85f6f..ec108c2 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -3011,6 +3011,26 @@ estimate_numbers_of_iterations (bool use_undefined_p)
   fold_undefer_and_ignore_overflow_warnings ();
 }
 
+/* Mark LOOP as a non-rolling loop.  */
+
+void
+mark_non_rolling_loop (struct loop *loop)
+{
+  gcc_assert (loop && loop->header);
+  loop->header->flags |= BB_HEADER_OF_NONROLLING_LOOP;
+}
+
+/* Return true if LOOP is a non-rolling loop.  */
+
+bool
+non_rolling_loop_p (struct loop *loop)
+{
+  int masked_flags;
+  gcc_assert (loop && loop->header);
+  masked_flags = (loop->header->flags & BB_HEADER_OF_NONROLLING_LOOP);
+ return (masked_flags != 0);
+}
+
 /* Returns true if statement S1 dominates statement S2.  */
 
 bool
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 28b75f1..9bbe68b 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -1938,6 +1938,10 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio,
 					    cond_expr, cond_expr_stmt_list);
   gcc_assert (new_loop);
   gcc_assert (loop_num == loop->num);
+
+  /* NEW_LOOP is a non-rolling loop.  */
+  mark_non_rolling_loop (new_loop);
+
 #ifdef ENABLE_CHECKING
   slpeel_verify_cfg_after_peeling (loop, new_loop);
 #endif
@@ -2191,6 +2195,10 @@ vect_do_peeling_for_alignment (loop_vec_info loop_vinfo)
 				   th, true, NULL_TREE, NULL);
 
   gcc_assert (new_loop);
+
+  /* NEW_LOOP is a non-rolling loop.  */
+  mark_non_rolling_loop (new_loop);
+
 #ifdef ENABLE_CHECKING
   slpeel_verify_cfg_after_peeling (new_loop, loop);
 #endif
-- 
1.6.3.3


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]