This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Add code-hoisting to GIMPLE


The following patch is Stevens code-hoisting based on PRE forward-ported
and fixed for bootstrap plus the case of hoisting code across loops
which we generally do not want (expressions in the loop exit target block
are antic-in throughout the whole loop unless they are killed and thus
get inserted into the exit block and then PREd before the loop).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

I'm going to try making the bitmap_set ops in do_hoist_insert a bit
faster - Steven, do you remember any issues with the approach from the
time you worked on it?

Thanks,
Richard.

2016-07-04  Steven Bosscher  <steven@gcc.gnu.org
	Richard Biener  <rguenther@suse.de>

	PR tree-optimization/23286
	PR tree-optimization/70159
	* doc/invoke.texi: Document -ftree-hoist.
	* common.opt (ftree-hoist): New flag.
	* opts.c (default_options_table): Enable -ftree-hoist at -O2+.
	* tree-ssa-pre.c (pre_stats): Add hoist_insert.
	(do_regular_insertion): Rename to ...
	(do_pre_regular_insertion): ... this and amend general comments
	on insertion strathegy.
	(do_partial_partial_insertion): Rename to ...
	(do_pre_partial_partial_insertion): ... this.
	(do_hoist_insertion): New function.
	(insert_aux): Take flags on whether to do PRE and/or hoist insertion
	and call do_hoist_insertion properly.
	(insert): Adjust.
	(pass_pre::gate): Enable also if -ftree-hoist is enabled.
	(pass_pre::execute): Register hoist_insert stats.

	* gcc.dg/tree-ssa/ssa-pre-11.c: Disable code hosting.
	* gcc.dg/tree-ssa/ssa-pre-27.c: Likewise.
	* gcc.dg/tree-ssa/ssa-pre-28.c: Likewise.
	* gcc.dg/tree-ssa/ssa-pre-2.c: Likewise.
	* gcc.dg/tree-ssa/pr35286.c: Likewise.
	* gcc.dg/tree-ssa/pr35287.c: Likewise.
	* gcc.dg/tree-ssa/loadpre3.c: Adjust so hosting doesn't apply.
	* gcc.dg/tree-ssa/pr43491.c: Scan optimized dump for desired result.
	* gcc.dg/tree-ssa/ssa-hoist-1.c: New testcase.
	* gcc.dg/tree-ssa/ssa-hoist-2.c: New testcase.
	* gcc.dg/tree-ssa/ssa-hoist-3.c: New testcase.
	* gcc.dg/tree-ssa/ssa-hoist-4.c: New testcase.
	* gcc.dg/tree-ssa/ssa-hoist-5.c: New testcase.
	* gcc.dg/tree-ssa/ssa-hoist-6.c: New testcase.

Index: gcc/doc/invoke.texi
===================================================================
*** gcc/doc/invoke.texi.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/doc/invoke.texi	2016-07-04 11:31:33.391904573 +0200
*************** Objective-C and Objective-C++ Dialects}.
*** 404,410 ****
  -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
  -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
  -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts @gol
! -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol
  -ftree-loop-if-convert-stores -ftree-loop-im @gol
  -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol
  -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
--- 404,410 ----
  -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
  -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
  -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts @gol
! -ftree-dse -ftree-forwprop -ftree-fre -ftree-hoist -ftree-loop-if-convert @gol
  -ftree-loop-if-convert-stores -ftree-loop-im @gol
  -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol
  -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
*************** also turns on the following optimization
*** 6372,6377 ****
--- 6372,6378 ----
  -fstrict-aliasing -fstrict-overflow @gol
  -ftree-builtin-call-dce @gol
  -ftree-switch-conversion -ftree-tail-merge @gol
+ -ftree-hoist @gol
  -ftree-pre @gol
  -ftree-vrp @gol
  -fipa-ra}
*************** and the @option{large-stack-frame-growth
*** 7265,7270 ****
--- 7266,7279 ----
  Perform reassociation on trees.  This flag is enabled by default
  at @option{-O} and higher.
  
+ @item -ftree-hoist
+ @opindex ftree-hoist
+ Perform code hoisting on trees.  Code hoisting tries to move the
+ evaluation of expressions executed on all paths to the function exit
+ as early as possible.  This is especially useful as a code size
+ optimization, but it often helps for code speed as well.
+ This flag is enabled by defailt at @option{-O2} and higher.
+ 
  @item -ftree-pre
  @opindex ftree-pre
  Perform partial redundancy elimination (PRE) on trees.  This flag is
*************** Dump each function after STORE-CCP@.  Th
*** 12222,12229 ****
  
  @item pre
  @opindex fdump-tree-pre
! Dump trees after partial redundancy elimination.  The file name is made
! by appending @file{.pre} to the source file name.
  
  @item fre
  @opindex fdump-tree-fre
--- 12231,12238 ----
  
  @item pre
  @opindex fdump-tree-pre
! Dump trees after partial redundancy elimination and/or code hoisting.
! The file name is made by appending @file{.pre} to the source file name.
  
  @item fre
  @opindex fdump-tree-fre
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-11.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-11.c.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-11.c	2016-07-04 11:31:33.391904573 +0200
***************
*** 1,5 ****
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
  double cos (double);
  double f(double a)
  {
--- 1,5 ----
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fno-tree-hoist -fdump-tree-pre-stats" } */
  double cos (double);
  double f(double a)
  {
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-2.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-2.c.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-2.c	2016-07-04 11:31:33.391904573 +0200
***************
*** 1,5 ****
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
  int motion_test1(int data, int data_0, int data_3, int v)
  {
  	int i;
--- 1,5 ----
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fno-tree-hoist -fdump-tree-pre-stats" } */
  int motion_test1(int data, int data_0, int data_3, int v)
  {
  	int i;
Index: gcc/testsuite/gcc.dg/tree-ssa/pr35286.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/pr35286.c.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/pr35286.c	2016-07-04 11:31:33.391904573 +0200
***************
*** 1,5 ****
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
  int g2;
  struct A {
      int a; int b;
--- 1,5 ----
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fno-tree-hoist -fdump-tree-pre-stats" } */
  int g2;
  struct A {
      int a; int b;
Index: gcc/testsuite/gcc.dg/tree-ssa/pr35287.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/pr35287.c.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/pr35287.c	2016-07-04 11:31:33.391904573 +0200
***************
*** 1,5 ****
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
  int *gp;
  int foo(int p)
  {
--- 1,5 ----
  /* { dg-do compile } */ 
! /* { dg-options "-O2 -fno-tree-hoist -fdump-tree-pre-stats" } */
  int *gp;
  int foo(int p)
  {
Index: gcc/testsuite/gcc.dg/tree-ssa/loadpre3.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/loadpre3.c.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/loadpre3.c	2016-07-04 11:31:33.391904573 +0200
***************
*** 1,5 ****
--- 1,8 ----
  /* { dg-do compile } */
  /* { dg-options "-O2 -fdump-tree-pre-stats" } */
+ 
+ extern void spoil (void);
+ 
  int foo(int **a,int argc)
  {
    int b;
*************** int foo(int **a,int argc)
*** 11,17 ****
      }
    else
      {
! 
      }
    /* Should be able to eliminate one of the *(*a)'s along the if path
       by pushing it into the else path. We will also eliminate
--- 14,21 ----
      }
    else
      {
!       /* Spoil *a and *(*a) to avoid hoisting it before the "if (...)".  */
!       spoil ();
      }
    /* Should be able to eliminate one of the *(*a)'s along the if path
       by pushing it into the else path. We will also eliminate
Index: gcc/opts.c
===================================================================
*** gcc/opts.c.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/opts.c	2016-07-04 11:31:33.391904573 +0200
*************** static const struct default_options defa
*** 500,505 ****
--- 500,506 ----
        REORDER_BLOCKS_ALGORITHM_STC },
      { OPT_LEVELS_2_PLUS, OPT_freorder_functions, NULL, 1 },
      { OPT_LEVELS_2_PLUS, OPT_ftree_vrp, NULL, 1 },
+     { OPT_LEVELS_2_PLUS, OPT_ftree_hoist, NULL, 1 },
      { OPT_LEVELS_2_PLUS, OPT_ftree_pre, NULL, 1 },
      { OPT_LEVELS_2_PLUS, OPT_ftree_switch_conversion, NULL, 1 },
      { OPT_LEVELS_2_PLUS, OPT_fipa_cp, NULL, 1 },
Index: gcc/tree-ssa-pre.c
===================================================================
*** gcc/tree-ssa-pre.c.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/tree-ssa-pre.c	2016-07-04 12:07:32.957077755 +0200
***************
*** 1,4 ****
! /* SSA-PRE for trees.
     Copyright (C) 2001-2016 Free Software Foundation, Inc.
     Contributed by Daniel Berlin <dan@dberlin.org> and Steven Bosscher
     <stevenb@suse.de>
--- 1,4 ----
! /* Full and partial redundancy elimination and code hoisting on SSA GIMPLE.
     Copyright (C) 2001-2016 Free Software Foundation, Inc.
     Contributed by Daniel Berlin <dan@dberlin.org> and Steven Bosscher
     <stevenb@suse.de>
*************** along with GCC; see the file COPYING3.
*** 54,64 ****
--- 54,85 ----
  #include "tree-cfgcleanup.h"
  #include "langhooks.h"
  
+ /* Even though this file is called tree-ssa-pre.c, we actually
+    implement a bit more than just PRE here.  All of them piggy-back
+    on GVN which is implemented in tree-ssa-sccvn.c.
+ 
+      1. Full Redundancy Elimination (FRE)
+ 	This is the elimination phase of GVN.
+ 
+      2. Partial Redundancy Elimination (PRE)
+ 	This is adds computation of AVAIL_OUT and ANTIC_IN and
+ 	doing expression insertion to form GVN-PRE.
+ 
+      3. Code hoisting
+ 	This optimization uses the ANTIC_IN sets computed for PRE
+ 	to move expressions further up than PRE would do, to make
+ 	multiple computations of the same value fully redundant.
+ 	This pass is explained below (after the explanation of the
+ 	basic algorithm for PRE).
+ */
+ 
  /* TODO:
  
     1. Avail sets can be shared by making an avail_find_leader that
        walks up the dominator tree and looks in those avail sets.
        This might affect code optimality, it's unclear right now.
+       Currently the AVAIL_OUT sets are the remaining quadraticness in
+       memory of GVN-PRE.
     2. Strength reduction can be performed by anticipating expressions
        we can repair later on.
     3. We can do back-substitution or smarter value numbering to catch
*************** along with GCC; see the file COPYING3.
*** 70,76 ****
     represent the actual statement containing the expressions we care about,
     and we cache the value number by putting it in the expression.  */
  
! /* Basic algorithm
  
     First we walk the statements to generate the AVAIL sets, the
     EXP_GEN sets, and the tmp_gen sets.  EXP_GEN sets represent the
--- 91,97 ----
     represent the actual statement containing the expressions we care about,
     and we cache the value number by putting it in the expression.  */
  
! /* Basic algorithm for Partial Redundancy Elimination:
  
     First we walk the statements to generate the AVAIL sets, the
     EXP_GEN sets, and the tmp_gen sets.  EXP_GEN sets represent the
*************** along with GCC; see the file COPYING3.
*** 110,126 ****
     In order to make it fully redundant, we insert the expression into
     the predecessors where it is not available, but is ANTIC.
  
     For the partial anticipation case, we only perform insertion if it
     is partially anticipated in some block, and fully available in all
     of the predecessors.
  
!    insert/insert_aux/do_regular_insertion/do_partial_partial_insertion
!    performs these steps.
  
     Fourth, we eliminate fully redundant expressions.
     This is a simple statement walk that replaces redundant
     calculations with the now available values.  */
  
  /* Representations of value numbers:
  
     Value numbers are represented by a representative SSA_NAME.  We
--- 131,205 ----
     In order to make it fully redundant, we insert the expression into
     the predecessors where it is not available, but is ANTIC.
  
+    When optimizing for size, we only eliminate the partial redundancy
+    if we need to insert in only one predecessor.  This avoids almost
+    completely the code size increase that PRE usually causes.
+ 
     For the partial anticipation case, we only perform insertion if it
     is partially anticipated in some block, and fully available in all
     of the predecessors.
  
!    do_pre_regular_insertion/do_pre_partial_partial_insertion
!    performs these steps, driven by insert/insert_aux.
  
     Fourth, we eliminate fully redundant expressions.
     This is a simple statement walk that replaces redundant
     calculations with the now available values.  */
  
+ /* Basic algorithm for Code Hoisting:
+ 
+    Code hoisting is: Moving value computations up in the control flow
+    graph to make multiple copies redundant.  Typically this is a size
+    optimization, but there are cases where it also is helpful for speed.
+ 
+    A simple code hoisting algorithm is implemented that piggy-backs on
+    the PRE infrastructure.  For code hoisting, we have to know ANTIC_OUT
+    which is effectively ANTIC_IN - AVAIL_OUT.  The latter two have to be
+    computed for PRE, and we can use them to perform a limited version of
+    code hoisting, too.
+ 
+    For the purpose of this implementation, a value is hoistable to a basic
+    block B if the following properties are met:
+ 
+    1. The value is in ANTIC_IN(B) -- the value will be computed on all
+       paths from B to function exit and it can be computed in B);
+ 
+    2. The value is not in AVAIL_OUT(B) -- there would be no need to
+       compute the value again and make it available twice;
+ 
+    3. All successors of B are dominated by B -- makes sure that inserting
+       a computation of the value in B will make the remaining
+       computations fully redundant;
+ 
+    4. At least one successor has the value in AVAIL_OUT -- to avoid
+       hoisting values up too far;
+ 
+    5. There are at least two successors of B -- hoisting in straight
+       line code is pointless.
+ 
+    The third condition is not strictly necessary, but it would complicate
+    the hoisting pass a lot.  In fact, I don't know of any code hoisting
+    algorithm that does not have this requirement.  Fortunately, experiments
+    have show that most candidate hoistable values are in regions that meet
+    this condition (e.g. diamond-shape regions).
+ 
+    The forth condition is necessary to avoid hoisting things up too far
+    away from the uses of the value.  Nothing else limits the algorithm
+    from hoisting everything up as far as ANTIC_IN allows.  Experiments
+    with SPEC and CSiBE have shown that hoisting up too far results in more
+    spilling, less benefits for code size, and worse benchmark scores.
+    Fortunately, in practice most of the interesting hoisting opportunities
+    are caught despite this limitation.
+ 
+    For hoistable values that meet all conditions, expressions are inserted
+    to make the calculation of the hoistable value fully redundant.  We
+    perform code hoisting insertions after each round of PRE insertions,
+    because code hoisting never exposes new PRE opportunities, but PRE can
+    create new code hoisting opportunities.
+ 
+    The code hoisting algorithm is implemented in do_hoist_insert, driven
+    by insert/insert_aux.  */
+ 
  /* Representations of value numbers:
  
     Value numbers are represented by a representative SSA_NAME.  We
*************** static struct
*** 445,450 ****
--- 524,532 ----
    /* The number of inserts found due to partial anticipation  */
    int pa_insert;
  
+   /* The number of inserts made for code hoisting.  */
+   int hoist_insert;
+ 
    /* The number of new PHI nodes added by PRE.  */
    int phis;
  } pre_stats;
*************** static pre_expr bitmap_find_leader (bitm
*** 454,459 ****
--- 536,542 ----
  static void bitmap_value_insert_into_set (bitmap_set_t, pre_expr);
  static void bitmap_value_replace_in_set (bitmap_set_t, pre_expr);
  static void bitmap_set_copy (bitmap_set_t, bitmap_set_t);
+ static void bitmap_set_and (bitmap_set_t, bitmap_set_t);
  static bool bitmap_set_contains_value (bitmap_set_t, unsigned int);
  static void bitmap_insert_into_set (bitmap_set_t, pre_expr);
  static void bitmap_insert_into_set_1 (bitmap_set_t, pre_expr,
*************** insert_into_preds_of_block (basic_block
*** 3103,3109 ****
  
  
  
! /* Perform insertion of partially redundant values.
     For BLOCK, do the following:
     1.  Propagate the NEW_SETS of the dominator into the current block.
     If the block has multiple predecessors,
--- 3186,3192 ----
  
  
  
! /* Perform insertion of partially redundant or hoistable values.
     For BLOCK, do the following:
     1.  Propagate the NEW_SETS of the dominator into the current block.
     If the block has multiple predecessors,
*************** insert_into_preds_of_block (basic_block
*** 3114,3128 ****
         2c. Insert a new PHI merging the values of the predecessors.
         2d. Insert the new PHI, and the new expressions, into the
  	   NEW_SETS set.
!    3. Recursively call ourselves on the dominator children of BLOCK.
! 
!    Steps 1, 2a, and 3 are done by insert_aux. 2b, 2c and 2d are done by
!    do_regular_insertion and do_partial_insertion.
! 
  */
  
  static bool
! do_regular_insertion (basic_block block, basic_block dom)
  {
    bool new_stuff = false;
    vec<pre_expr> exprs;
--- 3197,3216 ----
         2c. Insert a new PHI merging the values of the predecessors.
         2d. Insert the new PHI, and the new expressions, into the
  	   NEW_SETS set.
!    If the block has multiple successors,
!        3a. Iterate over the ANTIC values for the block to see if
! 	   any of them are good candidates for hoisting.
!        3b. If so, insert expressions computing the values in BLOCK,
! 	   and add the new expressions into the NEW_SETS set.
!    4. Recursively call ourselves on the dominator children of BLOCK.
! 
!    Steps 1, 2a, and 4 are done by insert_aux. 2b, 2c and 2d are done by
!    do_pre_regular_insertion and do_partial_insertion.  3a and 3b are
!    done in do_hoist_insertion.
  */
  
  static bool
! do_pre_regular_insertion (basic_block block, basic_block dom)
  {
    bool new_stuff = false;
    vec<pre_expr> exprs;
*************** do_regular_insertion (basic_block block,
*** 3291,3299 ****
     In this case, we know that putting it earlier will enable us to
     remove the later computation.  */
  
- 
  static bool
! do_partial_partial_insertion (basic_block block, basic_block dom)
  {
    bool new_stuff = false;
    vec<pre_expr> exprs;
--- 3379,3386 ----
     In this case, we know that putting it earlier will enable us to
     remove the later computation.  */
  
  static bool
! do_pre_partial_partial_insertion (basic_block block, basic_block dom)
  {
    bool new_stuff = false;
    vec<pre_expr> exprs;
*************** do_partial_partial_insertion (basic_bloc
*** 3422,3429 ****
    return new_stuff;
  }
  
  static bool
! insert_aux (basic_block block)
  {
    basic_block son;
    bool new_stuff = false;
--- 3509,3626 ----
    return new_stuff;
  }
  
+ /* Insert expressions in BLOCK to compute hoistable values up.
+    Return TRUE if something was inserted, otherwise return FALSE.
+    The caller has to make sure that BLOCK has at least two successors.  */
+ 
  static bool
! do_hoist_insertion (basic_block block)
! {
!   edge e;
!   edge_iterator ei;
!   bool new_stuff = false;
!   unsigned i;
!   bitmap_iterator bi;
!   bitmap_set_t hoistable_set;
!   bitmap availout_in_some;
!   gimple_stmt_iterator last;
! 
!   /* At least two successors, or else...  */
!   gcc_assert (EDGE_COUNT (block->succs) >= 2);
! 
!   /* Check that all successors of BLOCK are dominated by block.
!      We could use dominated_by_p() for this, but actually there is a much
!      quicker check: any successor that is dominated by BLOCK can't have
!      more than one predecessor edge.  */
!   FOR_EACH_EDGE (e, ei, block->succs)
!     if (! single_pred_p (e->dest))
!       return false;
! 
!   /* Determine the insertion point.  If we cannot safely insert before
!      the last stmt if we'd have to, bail out.  */
!   last = gsi_last_bb (block);
!   if (!gsi_end_p (last)
!       && !is_ctrl_stmt (gsi_stmt (last))
!       && stmt_ends_bb_p (gsi_stmt (last)))
!     return false;
! 
!   hoistable_set = bitmap_set_new ();
!   availout_in_some = BITMAP_ALLOC (&grand_bitmap_obstack);
! 
!   /* A hoistable value must be in ANTIC_IN(block)
!      but not in AVAIL_OUT(BLOCK).  */
!   bitmap_set_copy (hoistable_set, ANTIC_IN (block));
!   bitmap_set_subtract_values (hoistable_set, AVAIL_OUT (block));
! 
!   /* Short-cut for a common case: hoistable_set is empty.  */
!   if (bitmap_empty_p (&hoistable_set->values))
!     goto done;
! 
!   /* Compute which of the hoistable values is in AVAIL_OUT of
!      at least one of the successors of BLOCK.  */
!   FOR_EACH_EDGE (e, ei, block->succs)
!     /* Do not consider expressions solely because their availability
!        on loop exits.  They'd be ANTIC-IN throughout the whole loop
!        and thus effectively hoisted across loops by combination of
!        PRE and hoisting.  */
!     if (! loop_exit_edge_p (block->loop_father, e))
!       FOR_EACH_VALUE_ID_IN_SET (hoistable_set, i, bi)
! 	if (bitmap_set_contains_value (AVAIL_OUT (e->dest), i))
! 	  bitmap_set_bit (availout_in_some, i);
! 
!   /* Short-cut for a common case: (hoistable_set & availout_in_some) empty.  */
!   if (! bitmap_intersect_p (&hoistable_set->values, availout_in_some))
!     goto done;
! 
!   /* If there are candidate values for hoisting, insert expressions
!      strategically to make the hoistable expressions fully redundant.  */
!   EXECUTE_IF_SET_IN_BITMAP (&hoistable_set->expressions, 0, i, bi)
!     {
!       pre_expr expr = expression_for_id (i);
!       unsigned int value_id = get_expr_value_id (expr);
!       gimple_seq stmts = NULL;
! 
!       /* If the value of this expression is not available in at least one
! 	 successor, do not hoist the value.  */
!       if (! bitmap_bit_p (availout_in_some, value_id))
! 	continue;
! 
!       /* OK, we should hoist this value.  Perform the transformation.  */
!       tree res = create_expression_by_pieces (block, expr, &stmts,
! 					      get_expr_type (expr));
!       if (! res)
! 	continue;
! 
!       pre_stats.hoist_insert++;
!       if (dump_file && (dump_flags & TDF_DETAILS))
! 	{
! 	  fprintf (dump_file,
! 		   "Inserting expression in block %d for code hoisting: ",
! 		   block->index);
! 	  print_pre_expr (dump_file, expr);
! 	  fprintf (dump_file, "\n");
! 	}
! 
!       if (gsi_end_p (last) || is_ctrl_stmt (gsi_stmt (last)))
! 	gsi_insert_seq_before (&last, stmts, GSI_SAME_STMT);
!       else
! 	gsi_insert_seq_after (&last, stmts, GSI_SAME_STMT);
! 
!       new_stuff = true;
!     }
! 
! done:
!   BITMAP_FREE (availout_in_some);
!   bitmap_set_free (hoistable_set);
! 
!   return new_stuff;
! }
! 
! /* Do a dominator walk on the control flow graph, and insert computations
!    of values as necessary for PRE and hoisting.  */
! 
! static bool
! insert_aux (basic_block block, bool do_pre, bool do_hoist)
  {
    basic_block son;
    bool new_stuff = false;
*************** insert_aux (basic_block block)
*** 3436,3442 ****
  	{
  	  unsigned i;
  	  bitmap_iterator bi;
! 	  bitmap_set_t newset = NEW_SETS (dom);
  	  if (newset)
  	    {
  	      /* Note that we need to value_replace both NEW_SETS, and
--- 3633,3643 ----
  	{
  	  unsigned i;
  	  bitmap_iterator bi;
! 	  bitmap_set_t newset;
! 
! 	  /* First, update the AVAIL_OUT set with anything we may have
! 	     inserted higher up in the dominator tree.  */
! 	  newset = NEW_SETS (dom);
  	  if (newset)
  	    {
  	      /* Note that we need to value_replace both NEW_SETS, and
*************** insert_aux (basic_block block)
*** 3450,3474 ****
  		  bitmap_value_replace_in_set (AVAIL_OUT (block), expr);
  		}
  	    }
! 	  if (!single_pred_p (block))
  	    {
! 	      new_stuff |= do_regular_insertion (block, dom);
  	      if (do_partial_partial)
! 		new_stuff |= do_partial_partial_insertion (block, dom);
  	    }
  	}
      }
    for (son = first_dom_son (CDI_DOMINATORS, block);
         son;
         son = next_dom_son (CDI_DOMINATORS, son))
      {
!       new_stuff |= insert_aux (son);
      }
  
    return new_stuff;
  }
  
! /* Perform insertion of partially redundant values.  */
  
  static void
  insert (void)
--- 3651,3681 ----
  		  bitmap_value_replace_in_set (AVAIL_OUT (block), expr);
  		}
  	    }
! 
! 	  /* Insert expressions for partial redundancies.  */
! 	  if (do_pre && !single_pred_p (block))
  	    {
! 	      new_stuff |= do_pre_regular_insertion (block, dom);
  	      if (do_partial_partial)
! 		new_stuff |= do_pre_partial_partial_insertion (block, dom);
  	    }
+ 
+ 	  /* Insert expressions for hoisting.  */
+ 	  if (do_hoist && EDGE_COUNT (block->succs) >= 2)
+ 	    new_stuff |= do_hoist_insertion (block);
  	}
      }
    for (son = first_dom_son (CDI_DOMINATORS, block);
         son;
         son = next_dom_son (CDI_DOMINATORS, son))
      {
!       new_stuff |= insert_aux (son, do_pre, do_hoist);
      }
  
    return new_stuff;
  }
  
! /* Perform insertion of partially redundant and hoistable values.  */
  
  static void
  insert (void)
*************** insert (void)
*** 3485,3491 ****
        num_iterations++;
        if (dump_file && dump_flags & TDF_DETAILS)
  	fprintf (dump_file, "Starting insert iteration %d\n", num_iterations);
!       new_stuff = insert_aux (ENTRY_BLOCK_PTR_FOR_FN (cfun));
  
        /* Clear the NEW sets before the next iteration.  We have already
           fully propagated its contents.  */
--- 3692,3699 ----
        num_iterations++;
        if (dump_file && dump_flags & TDF_DETAILS)
  	fprintf (dump_file, "Starting insert iteration %d\n", num_iterations);
!       new_stuff = insert_aux (ENTRY_BLOCK_PTR_FOR_FN (cfun), flag_tree_pre,
! 			      flag_tree_hoist);
  
        /* Clear the NEW sets before the next iteration.  We have already
           fully propagated its contents.  */
*************** public:
*** 4764,4770 ****
    {}
  
    /* opt_pass methods: */
!   virtual bool gate (function *) { return flag_tree_pre != 0; }
    virtual unsigned int execute (function *);
  
  }; // class pass_pre
--- 4972,4979 ----
    {}
  
    /* opt_pass methods: */
!   virtual bool gate (function *)
!     { return flag_tree_pre != 0 || flag_tree_hoist != 0; }
    virtual unsigned int execute (function *);
  
  }; // class pass_pre
*************** pass_pre::execute (function *fun)
*** 4819,4824 ****
--- 5028,5034 ----
  
    statistics_counter_event (fun, "Insertions", pre_stats.insertions);
    statistics_counter_event (fun, "PA inserted", pre_stats.pa_insert);
+   statistics_counter_event (fun, "HOIST inserted", pre_stats.hoist_insert);
    statistics_counter_event (fun, "New PHIs", pre_stats.phis);
    statistics_counter_event (fun, "Eliminated", pre_stats.eliminations);
  
Index: gcc/common.opt
===================================================================
*** gcc/common.opt.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/common.opt	2016-07-04 11:31:33.395904620 +0200
*************** ftree-fre
*** 2382,2387 ****
--- 2382,2391 ----
  Common Report Var(flag_tree_fre) Optimization
  Enable Full Redundancy Elimination (FRE) on trees.
  
+ ftree-hoist
+ Common Report Var(flag_tree_hoist) Optimization
+ Enable code hoisting on trees
+ 
  foptimize-strlen
  Common Report Var(flag_optimize_strlen) Optimization
  Enable string length optimizations on trees.
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-1.c
===================================================================
*** /dev/null	1970-01-01 00:00:00.000000000 +0000
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-1.c	2016-07-04 11:31:33.395904620 +0200
***************
*** 0 ****
--- 1,16 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fdump-tree-pre" } */
+ 
+ unsigned short f(unsigned short a)
+ {
+   if (a & 0x8000)
+     a <<= 1, a = a ^ 0x1021;
+   else
+     a <<= 1;
+ 
+   return a;
+ }
+ 
+ /* We should hoist and CSE the shift.  */
+ 
+ /* { dg-final { scan-tree-dump-times " << 1;" 1 "pre" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-2.c
===================================================================
*** /dev/null	1970-01-01 00:00:00.000000000 +0000
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-2.c	2016-07-04 11:31:33.395904620 +0200
***************
*** 0 ****
--- 1,15 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fdump-tree-pre" } */
+ 
+ int f(int i)
+ {
+   if (i < 0)
+     return i/10+ i;
+   return i/10 + i;
+ }
+ 
+ /* Hoisting of i/10 + i should make the code straight-line
+    with one division.  */
+ 
+ /* { dg-final { scan-tree-dump-times "goto" 0 "pre" } } */
+ /* { dg-final { scan-tree-dump-times " / 10;" 1 "pre" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr43491.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/pr43491.c.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/pr43491.c	2016-07-04 11:31:33.395904620 +0200
***************
*** 1,5 ****
  /* { dg-do compile } */
! /* { dg-options "-O2 -fdump-tree-pre-stats" } */
  
  #define REGISTER register
  
--- 1,5 ----
  /* { dg-do compile } */
! /* { dg-options "-O2 -fdump-tree-optimized" } */
  
  #define REGISTER register
  
*************** long foo(long data, long v)
*** 35,41 ****
  	u = i;
  	return v * t * u;
  }
  /* We should not eliminate global register variable when it is the RHS of
!    a single assignment.  */
! /* { dg-final { scan-tree-dump-times "Eliminated: 2" 1 "pre" { target { arm*-*-* i?86-*-* mips*-*-* x86_64-*-* } } } } */
! /* { dg-final { scan-tree-dump-times "Eliminated: 3" 1 "pre" { target { ! { arm*-*-* i?86-*-* mips*-*-* x86_64-*-* } } } } } */
--- 35,45 ----
  	u = i;
  	return v * t * u;
  }
+ 
  /* We should not eliminate global register variable when it is the RHS of
!    a single assignment.  So the number of loads from data_0 has to match
!    that of the number of adds (we hoist data_0 + data_3 above the
!    if (data) and eliminate the useless copy).  */
! 
! /* { dg-final { scan-tree-dump-times "= data_0;" 1 "optimized" { target { arm*-*-* i?86-*-* mips*-*-* x86_64-*-* } } } } */
! /* { dg-final { scan-tree-dump-times " \\+ " 1 "optimized" { target { ! { arm*-*-* i?86-*-* mips*-*-* x86_64-*-* } } } } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-3.c
===================================================================
*** /dev/null	1970-01-01 00:00:00.000000000 +0000
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-3.c	2016-07-04 11:31:33.395904620 +0200
***************
*** 0 ****
--- 1,18 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fdump-tree-pre-stats" } */
+ 
+ int test (int a, int b, int c, int g)
+ {
+   int d, e;
+   if (a)
+     d = b * c;
+   else
+     d = b - c;
+   e = b * c + g;
+   return d + e;
+ }
+ 
+ /* We should hoist and CSE only the multiplication.  */
+ 
+ /* { dg-final { scan-tree-dump-times " \\* " 1 "pre" } } */
+ /* { dg-final { scan-tree-dump "Insertions: 1" "pre" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-4.c
===================================================================
*** /dev/null	1970-01-01 00:00:00.000000000 +0000
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-4.c	2016-07-04 11:31:33.395904620 +0200
***************
*** 0 ****
--- 1,24 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fdump-tree-optimized" } */
+ 
+ /* From PR21485.  */
+ 
+ long
+ NumSift (long *array, int b, unsigned long k)
+ {
+   if (b)
+     if (array[k] < array[k + 1L])
+       ++k;
+   return array[k];
+ }
+ 
+ /* There should be only two loads left.  And the final value in the
+    if (b) arm should be if-converted:
+      tem1 = array[k];
+      if (b)
+        tem1 = MAX (array[k+1], tem1)
+      return tem1;  */
+ 
+ /* { dg-final { scan-tree-dump-times "= \\*" 2 "optimized" } } */
+ /* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "optimized" } } */
+ /* { dg-final { scan-tree-dump-times "= PHI" 1 "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-27.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-27.c.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-27.c	2016-07-04 11:31:33.395904620 +0200
***************
*** 1,5 ****
  /* { dg-do compile } */
! /* { dg-options "-O2 -fdump-tree-pre" } */
  
  int foo (int i, int j, int b)
  {
--- 1,5 ----
  /* { dg-do compile } */
! /* { dg-options "-O2 -fdump-tree-pre -fno-tree-hoist" } */
  
  int foo (int i, int j, int b)
  {
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-28.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-28.c.orig	2016-07-04 11:31:18.171727125 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-28.c	2016-07-04 11:31:33.395904620 +0200
***************
*** 1,6 ****
  /* PR37997 */
  /* { dg-do compile } */
! /* { dg-options "-O2 -fdump-tree-pre-details" } */
  
  int foo (int i, int b, int result)
  {
--- 1,6 ----
  /* PR37997 */
  /* { dg-do compile } */
! /* { dg-options "-O2 -fdump-tree-pre-details -fno-tree-hoist" } */
  
  int foo (int i, int b, int result)
  {
*************** int foo (int i, int b, int result)
*** 16,20 ****
--- 16,22 ----
  
  /* We should insert i + 1 into the if (b) path as well as the simplified
     i + 1 & -2 expression.  And do replacement with two PHI temps.  */
+ /* With hoisting enabled we'd hoist i + 1 to before the if, retaining
+    only one PHI node.  */
  
  /* { dg-final { scan-tree-dump-times "with prephitmp" 2 "pre" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-6.c
===================================================================
*** /dev/null	1970-01-01 00:00:00.000000000 +0000
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-6.c	2016-07-04 11:30:57.423485226 +0200
***************
*** 0 ****
--- 1,19 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fdump-tree-pre-stats" } */
+ 
+ int a[1024];
+ int b[1024], c[1024];
+ void foo ()
+ {
+   for (int j = 0; j < 1024; ++j)
+     {
+       for (int i = 0; i < 1024; ++i)
+ 	a[i] = j;
+       b[j] = c[j];
+     }
+ }
+ 
+ /* We should not hoist/PRE the outer loop IV increment or the load
+    from c across the inner loop.  */
+ 
+ /* { dg-final { scan-tree-dump-not "HOIST inserted" "pre" } } */


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]