This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH]: Code sinking patch


I've been asked by a number of people over the past few weeks to submit, for 4.0, the code sinking patch i was working on for 4.1, because it provides significantly better scalar code than what we generate now in some cases for real world code compared to other compilers (such as MSVC).

Code where this helps is code where we can move a computation or store from an always-executed place to a not-always-executed place that still reaches all of it's uses.

IE we can transform
int
foo (int a, int b, int c)
{
  int x = a * b;
  return c ? x : a;
}

into
foo (int a, int b, int c)
{
  int x;
  if (c)
    {
      x = a *b;
      return x;
    }
  return a;
}
(and the same with stores and other friends.).

At the assembly level, the code used to look like:
foo:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %eax
        imull   12(%ebp), %eax
        movl    16(%ebp), %edx
        testl   %edx, %edx
        jne     .L2
        movl    8(%ebp), %eax
.L2:
        leave
        ret

and now looks like:

foo:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %eax
        movl    16(%ebp), %edx
        testl   %edx, %edx
        jne     .L7
        leave
        ret
.L7:
        imull   12(%ebp), %eax
        leave
        ret

This is a quite common idiom (it pops up many thousand times during a gcc bootstrap), and we have nothing at the RTL level that catches it.

I've removed the "experimental" or possibly non-profitable portions of this pass in order to try to make it as palatable as possible for 4.0.

This means it does no "real" partial dead code elimination (none by inserting new phis or temporaries), or store removal through insertion of temporaries, etc. Just simple regular code sinking and phi arg edge sinking when we can prove that we are moving it to somewhere that doesn't execute as often as the original block.

I've listed them both as todo at the top of the file, even though i already have code to do them that is planned for 4.1 submission,

I've timed this code on cc1-i-files, and the compile time change is in the noise (506 seconds without this patch, 504 seconds with it)

I've also timed it on the large switch statement PR's, a full bootstrap, and some other miscellanous large PR's, there is no time regression.

Almost all the time in this optimization ends up being spent in the loop optimizer init (which we use to make sure we don't move things into loops)

I have bootstrapped and regtested this optimization on i686-pc-linux-gnu, ppc64-linux, and powerpc-darwin.

--Dan

2004-12-03 Daniel Berlin <dberlin@dberlin.org>

	* Makefile.in (tree-ssa-sink.o): New.
	(OBJS-common): Add tree-ssa-sink.o.
	* common.opt: Add -ftree-sink
	* opts.c (decode_options): flag_tree_sink is set at O1 or higher.
	* timevar.def (TV_TREE_SINK): new timevar.
	* tree-flow.h (is_hidden_global_store): Prototype.
	* tree-optimize.c (init_tree_optimization_passes): Add
	pass_sink_code.
	* tree-pass.h (pass_sink_code): New.
	* tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Move checking
	for non-obvious global store to is_hidden_global_store, and
	call that new function.
	* tree-ssa-sink.c: New file.
	* doc/invoke.texi: Document -fdump-tree-sink and -ftree-sink.
	* doc/passes.texi: Document forward store motion.
	* testsuite/gcc.dg/tree-ssa/ssa-sink-1.c: New test
	* testsuite/gcc.dg/tree-ssa/ssa-sink-2.c: New test
	* testsuite/gcc.dg/tree-ssa/ssa-sink-3.c: New test
	* testsuite/gcc.dg/tree-ssa/ssa-sink-4.c: New test

Index: gcc/Makefile.in
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Makefile.in,v
retrieving revision 1.1431
diff -u -p -r1.1431 Makefile.in
--- gcc/Makefile.in	3 Dec 2004 22:04:27 -0000	1.1431
+++ gcc/Makefile.in	5 Dec 2004 18:16:09 -0000
@@ -929,7 +929,7 @@ OBJS-common = \
  varasm.o varray.o vec.o version.o vmsdbgout.o xcoffout.o alloc-pool.o	   \
  et-forest.o cfghooks.o bt-load.o pretty-print.o $(GGC) web.o passes.o	   \
  rtl-profile.o tree-profile.o rtlhooks.o cfgexpand.o lambda-mat.o          \
- lambda-trans.o	lambda-code.o tree-loop-linear.o
+ lambda-trans.o	lambda-code.o tree-loop-linear.o tree-ssa-sink.o

 OBJS-md = $(out_object_file)
 OBJS-archive = $(EXTRA_OBJS) $(host_hook_obj) tree-inline.o		   \
@@ -1666,6 +1666,10 @@ tree-ssa-pre.o : tree-ssa-pre.c $(TREE_F
    $(SYSTEM_H) $(TREE_H) $(TM_P_H) $(EXPR_H) \
    $(GGC_H) output.h diagnostic.h errors.h toplev.h $(TIMEVAR_H) \
    $(TM_H) coretypes.h $(TREE_DUMP_H) tree-pass.h $(FLAGS_H)
+tree-ssa-sink.o : tree-ssa-sink.c $(TREE_FLOW_H) $(CONFIG_H) \
+   $(SYSTEM_H) $(TREE_H) $(TM_P_H) $(EXPR_H) \
+   $(GGC_H) output.h diagnostic.h errors.h toplev.h $(TIMEVAR_H) \
+   $(TM_H) coretypes.h $(TREE_DUMP_H) tree-pass.h $(FLAGS_H)
 tree-vn.o : tree-vn.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(GGC_H) \
    $(TREE_H) $(TREE_FLOW_H) $(HASHTAB_H) langhooks.h tree-pass.h \
    $(TREE_DUMP_H) diagnostic.h
Index: gcc/common.opt
===================================================================
RCS file: /cvs/gcc/gcc/gcc/common.opt,v
retrieving revision 1.59
diff -u -p -r1.59 common.opt
--- gcc/common.opt	28 Oct 2004 03:42:22 -0000	1.59
+++ gcc/common.opt	5 Dec 2004 18:16:09 -0000
@@ -872,6 +872,10 @@ ftree-pre
 Common Report Var(flag_tree_pre)
 Enable SSA-PRE optimization on trees

+ftree-sink
+Common Report Var(flag_tree_sink)
+Enable SSA code sinking on trees
+
 ftree-sra
 Common Report Var(flag_tree_sra)
 Perform scalar replacement of aggregates
Index: gcc/opts.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/opts.c,v
retrieving revision 1.90
diff -u -p -r1.90 opts.c
--- gcc/opts.c	29 Nov 2004 09:33:08 -0000	1.90
+++ gcc/opts.c	5 Dec 2004 18:16:09 -0000
@@ -502,6 +502,7 @@ decode_options (unsigned int argc, const
       flag_tree_sra = 1;
       flag_tree_copyrename = 1;
       flag_tree_fre = 1;
+      flag_tree_sink = 1;

       if (!optimize_size)
 	{
Index: gcc/timevar.def
===================================================================
RCS file: /cvs/gcc/gcc/gcc/timevar.def,v
retrieving revision 1.41
diff -u -p -r1.41 timevar.def
--- gcc/timevar.def	29 Nov 2004 00:03:40 -0000	1.41
+++ gcc/timevar.def	5 Dec 2004 18:16:10 -0000
@@ -78,6 +78,7 @@ DEFTIMEVAR (TV_TREE_SPLIT_EDGES      , "
 DEFTIMEVAR (TV_TREE_PRE		     , "tree PRE")
 DEFTIMEVAR (TV_TREE_REDPHI	     , "tree remove redundant PHIs")
 DEFTIMEVAR (TV_TREE_FRE		     , "tree FRE")
+DEFTIMEVAR (TV_TREE_SINK             , "tree code sinking")
 DEFTIMEVAR (TV_TREE_PHIOPT	     , "tree linearize phis")
 DEFTIMEVAR (TV_TREE_FORWPROP	     , "tree forward propagate")
 DEFTIMEVAR (TV_TREE_DCE		     , "tree conservative DCE")
Index: gcc/tree-flow.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-flow.h,v
retrieving revision 2.73
diff -u -p -r2.73 tree-flow.h
--- gcc/tree-flow.h	25 Nov 2004 22:31:08 -0000	2.73
+++ gcc/tree-flow.h	5 Dec 2004 18:16:10 -0000
@@ -734,6 +734,8 @@ tree vn_lookup (tree, vuse_optype);
 void vn_init (void);
 void vn_delete (void);

+/* In tree-ssa-sink.c  */
+bool is_hidden_global_store (tree);

 /* In tree-sra.c  */
 void insert_edge_copies (tree, basic_block);
Index: gcc/tree-optimize.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-optimize.c,v
retrieving revision 2.65
diff -u -p -r2.65 tree-optimize.c
--- gcc/tree-optimize.c	30 Nov 2004 15:38:33 -0000	2.65
+++ gcc/tree-optimize.c	5 Dec 2004 18:16:10 -0000
@@ -378,6 +378,7 @@ init_tree_optimization_passes (void)
   NEXT_PASS (pass_may_alias);
   NEXT_PASS (pass_split_crit_edges);
   NEXT_PASS (pass_pre);
+  NEXT_PASS (pass_sink_code);
   NEXT_PASS (pass_loop);
   NEXT_PASS (pass_dominator);
   NEXT_PASS (pass_redundant_phi);
Index: gcc/tree-pass.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-pass.h,v
retrieving revision 2.22
diff -u -p -r2.22 tree-pass.h
--- gcc/tree-pass.h	28 Nov 2004 21:02:31 -0000	2.22
+++ gcc/tree-pass.h	5 Dec 2004 18:16:10 -0000
@@ -164,5 +164,6 @@ extern struct tree_opt_pass pass_rest_of
 extern struct tree_opt_pass pass_fre;
 extern struct tree_opt_pass pass_linear_transform;
 extern struct tree_opt_pass pass_maybe_create_global_var;
+extern struct tree_opt_pass pass_sink_code;

 #endif /* GCC_TREE_PASS_H */
Index: gcc/tree-ssa-dce.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-dce.c,v
retrieving revision 2.28
diff -u -p -r2.28 tree-ssa-dce.c
--- gcc/tree-ssa-dce.c	22 Nov 2004 12:23:58 -0000	2.28
+++ gcc/tree-ssa-dce.c	5 Dec 2004 18:16:11 -0000
@@ -273,8 +273,6 @@ mark_operand_necessary (tree op, bool ph
 static void
 mark_stmt_if_obviously_necessary (tree stmt, bool aggressive)
 {
-  v_may_def_optype v_may_defs;
-  v_must_def_optype v_must_defs;
   stmt_ann_t ann;
   tree op, def;
   ssa_op_iter iter;
@@ -364,78 +362,10 @@ mark_stmt_if_obviously_necessary (tree s
 	  return;
         }
     }
-
-  /* Check virtual definitions.  If we get here, the only virtual
-     definitions we should see are those generated by assignment
-     statements.  */
-  v_may_defs = V_MAY_DEF_OPS (ann);
-  v_must_defs = V_MUST_DEF_OPS (ann);
-  if (NUM_V_MAY_DEFS (v_may_defs) > 0 || NUM_V_MUST_DEFS (v_must_defs) > 0)
-    {
-      tree lhs;
-
-      gcc_assert (TREE_CODE (stmt) == MODIFY_EXPR);
-
-      /* Note that we must not check the individual virtual operands
-	 here.  In particular, if this is an aliased store, we could
-	 end up with something like the following (SSA notation
-	 redacted for brevity):
-
-	 	foo (int *p, int i)
-		{
-		  int x;
-		  p_1 = (i_2 > 3) ? &x : p_1;
-
-		  # x_4 = V_MAY_DEF <x_3>
-		  *p_1 = 5;
-
-		  return 2;
-		}
-
-	 Notice that the store to '*p_1' should be preserved, if we
-	 were to check the virtual definitions in that store, we would
-	 not mark it needed.  This is because 'x' is not a global
-	 variable.
-
-	 Therefore, we check the base address of the LHS.  If the
-	 address is a pointer, we check if its name tag or type tag is
-	 a global variable.  Otherwise, we check if the base variable
-	 is a global.  */
-      lhs = TREE_OPERAND (stmt, 0);
-      if (REFERENCE_CLASS_P (lhs))
-	lhs = get_base_address (lhs);
-
-      if (lhs == NULL_TREE)
-	{
-	  /* If LHS is NULL, it means that we couldn't get the base
-	     address of the reference.  In which case, we should not
-	     remove this store.  */
-	  mark_stmt_necessary (stmt, true);
-	}
-      else if (DECL_P (lhs))
-	{
-	  /* If the store is to a global symbol, we need to keep it.  */
-	  if (is_global_var (lhs))
-	    mark_stmt_necessary (stmt, true);
-	}
-      else if (INDIRECT_REF_P (lhs))
-	{
-	  tree ptr = TREE_OPERAND (lhs, 0);
-	  struct ptr_info_def *pi = SSA_NAME_PTR_INFO (ptr);
-	  tree nmt = (pi) ? pi->name_mem_tag : NULL_TREE;
-	  tree tmt = var_ann (SSA_NAME_VAR (ptr))->type_mem_tag;
-
-	  /* If either the name tag or the type tag for PTR is a
-	     global variable, then the store is necessary.  */
-	  if ((nmt && is_global_var (nmt))
-	      || (tmt && is_global_var (tmt)))
-	    {
-	      mark_stmt_necessary (stmt, true);
-	      return;
-	    }
-	}
-      else
-	gcc_unreachable ();
+  if (is_hidden_global_store (stmt))
+    {
+      mark_stmt_necessary (stmt, true);
+      return;
     }

return;
Index: gcc/tree-ssa-sink.c
===================================================================
RCS file: gcc/tree-ssa-sink.c
diff -N gcc/tree-ssa-sink.c
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ gcc/tree-ssa-sink.c 5 Dec 2004 18:16:11 -0000
@@ -0,0 +1,570 @@
+/* Code sinking for trees
+ Copyright (C) 2001, 2002, 2003, 2004 Free Software Foundation, Inc.
+ Contributed by Daniel Berlin <dan@dberlin.org>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING. If not, write to
+the Free Software Foundation, 59 Temple Place - Suite 330,
+Boston, MA 02111-1307, USA. */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "errors.h"
+#include "ggc.h"
+#include "tree.h"
+#include "basic-block.h"
+#include "diagnostic.h"
+#include "tree-inline.h"
+#include "tree-flow.h"
+#include "tree-gimple.h"
+#include "tree-dump.h"
+#include "timevar.h"
+#include "fibheap.h"
+#include "hashtab.h"
+#include "tree-iterator.h"
+#include "real.h"
+#include "alloc-pool.h"
+#include "tree-pass.h"
+#include "flags.h"
+#include "bitmap.h"
+#include "langhooks.h"
+#include "cfgloop.h"
+
+/* TODO:
+ 1. Sinking store only using scalar promotion (IE without moving the RHS):
+
+ *q = p;
+ p = p + 1;
+ if (something)
+ *q = <not p>;
+ else
+ y = *q;
+
+ + should become
+ sinktemp = p;
+ p = p + 1;
+ if (something)
+ *q = <not p>;
+ else
+ {
+ *q = sinktemp;
+ y = *q
+ }
+ Store copy propagation will take care of the store elimination above.
+ +
+ 2. Sinking using Partial Dead Code Elimination. */
+
+
+static struct
+{ + /* The number of statements sunk down the flowgraph by code sinking. */
+ int sunk;
+ +} sink_stats;
+
+
+/* Given a PHI, and one of it's arguments (DEF), find the edge for
+ that argument and return it. If the argument occurs twice in the PHI node,
+ we return NULL. */
+
+static basic_block
+find_bb_for_arg (tree phi, tree def)
+{
+ int i;
+ bool foundone = false;
+ basic_block result = NULL;
+ for (i = 0; i < PHI_NUM_ARGS (phi); i++)
+ if (PHI_ARG_DEF (phi, i) == def)
+ {
+ if (foundone)
+ return NULL;
+ foundone = true;
+ result = PHI_ARG_EDGE (phi, i)->src;
+ }
+ return result;
+}
+
+/* When the first immediate use is in a statement, then return true if all
+ immediate uses in IMM are in the same statement.
+ We could also do the case where the first immediate use is in a phi node,
+ and all the other uses are in phis in the same basic block, but this
+ requires some expensive checking later (you have to make sure no def/vdef
+ in the statement occurs for multiple edges in the various phi nodes it's
+ used in, so that you only have one place you can sink it to. */
+
+static bool
+all_immediate_uses_same_place (dataflow_t imm)
+{
+ int i;
+ tree firstuse;
+
+ if (imm == NULL || num_immediate_uses (imm) < 2)
+ return true;
+ firstuse = immediate_use (imm, 0);
+
+ for (i = 1; i < num_immediate_uses (imm); i++)
+ {
+ tree immuse = immediate_use (imm, i);
+ if (immuse != firstuse)
+ return false;
+ }
+ return true;
+}
+
+/* Some global stores don't necessarily have V_MAY_DEF's of global variables,
+ but we still must avoid moving them around. */
+
+bool
+is_hidden_global_store (tree stmt)
+{
+ stmt_ann_t ann = stmt_ann (stmt);
+ v_may_def_optype v_may_defs;
+ v_must_def_optype v_must_defs;
+ + /* Check virtual definitions. If we get here, the only virtual
+ definitions we should see are those generated by assignment
+ statements. */
+ v_may_defs = V_MAY_DEF_OPS (ann);
+ v_must_defs = V_MUST_DEF_OPS (ann);
+ if (NUM_V_MAY_DEFS (v_may_defs) > 0 || NUM_V_MUST_DEFS (v_must_defs) > 0)
+ {
+ tree lhs;
+
+ gcc_assert (TREE_CODE (stmt) == MODIFY_EXPR);
+
+ /* Note that we must not check the individual virtual operands
+ here. In particular, if this is an aliased store, we could
+ end up with something like the following (SSA notation
+ redacted for brevity):
+
+ foo (int *p, int i)
+ {
+ int x;
+ p_1 = (i_2 > 3) ? &x : p;
+
+ # x_4 = V_MAY_DEF <x_3>
+ *p_1 = 5;
+
+ return 2;
+ }
+
+ Notice that the store to '*p_1' should be preserved, if we
+ were to check the virtual definitions in that store, we would
+ not mark it needed. This is because 'x' is not a global
+ variable.
+
+ Therefore, we check the base address of the LHS. If the
+ address is a pointer, we check if its name tag or type tag is
+ a global variable. Otherwise, we check if the base variable
+ is a global. */
+ lhs = TREE_OPERAND (stmt, 0);
+ if (REFERENCE_CLASS_P (lhs))
+ lhs = get_base_address (lhs);
+
+ if (lhs == NULL_TREE)
+ {
+ /* If LHS is NULL, it means that we couldn't get the base
+ address of the reference. In which case, we should not
+ move this store. */
+ return true;
+ }
+ else if (DECL_P (lhs))
+ {
+ /* If the store is to a global symbol, we need to keep it. */
+ if (is_global_var (lhs))
+ return true;
+
+ }
+ else if (INDIRECT_REF_P (lhs))
+ {
+ tree ptr = TREE_OPERAND (lhs, 0);
+ struct ptr_info_def *pi = SSA_NAME_PTR_INFO (ptr);
+ tree nmt = (pi) ? pi->name_mem_tag : NULL_TREE;
+ tree tmt = var_ann (SSA_NAME_VAR (ptr))->type_mem_tag;
+
+ /* If either the name tag or the type tag for PTR is a
+ global variable, then the store is necessary. */
+ if ((nmt && is_global_var (nmt))
+ || (tmt && is_global_var (tmt)))
+ {
+ return true;
+ }
+ }
+ else
+ gcc_unreachable ();
+ }
+ return false;
+}
+
+/* Find the nearest common dominator of all of the immediate uses in IMM. */
+
+static basic_block
+nearest_common_dominator_of_uses (dataflow_t imm)
+{ + bitmap blocks = BITMAP_XMALLOC ();
+ basic_block commondom;
+ int i;
+ unsigned int j;
+ bitmap_iterator bi;
+ bitmap_clear (blocks);
+ for (i = 0; i < num_immediate_uses (imm); i++)
+ {
+ tree usestmt = immediate_use (imm, i);
+ basic_block useblock;
+ if (TREE_CODE (usestmt) == PHI_NODE)
+ {
+ int j;
+ for (j = 0; j < PHI_NUM_ARGS (usestmt); j++)
+ {
+ useblock = PHI_ARG_EDGE (usestmt, j)->src;
+ /* Short circuit. Nothing dominates the entry block. */
+ if (useblock == ENTRY_BLOCK_PTR)
+ {
+ BITMAP_XFREE (blocks);
+ return NULL;
+ }
+ bitmap_set_bit (blocks, useblock->index);
+ }
+ }
+ else
+ {
+ useblock = bb_for_stmt (usestmt);
+
+ /* Short circuit. Nothing dominates the entry block. */
+ if (useblock == ENTRY_BLOCK_PTR)
+ {
+ BITMAP_XFREE (blocks);
+ return NULL;
+ }
+ bitmap_set_bit (blocks, useblock->index);
+ }
+ }
+ commondom = BASIC_BLOCK (bitmap_first_set_bit (blocks));
+ EXECUTE_IF_SET_IN_BITMAP (blocks, 0, j, bi)
+ commondom = nearest_common_dominator (CDI_DOMINATORS, commondom, + BASIC_BLOCK (j));
+ BITMAP_XFREE (blocks);
+ return commondom;
+}
+
+/* Given a statement (STMT) and the basic block it is currently in (FROMBB), + determine the location to sink the statement to, if any.
+ Return the basic block to sink it to, or NULL if we should not sink
+ it. */
+
+static tree
+statement_sink_location (tree stmt, basic_block frombb)
+{
+ dataflow_t imm = get_immediate_uses (stmt);
+ tree use, def;
+ basic_block sinkbb;
+ use_operand_p use_p;
+ def_operand_p def_p;
+ ssa_op_iter iter;
+ stmt_ann_t ann;
+ tree rhs;
+
+ if (imm == NULL)
+ return NULL;
+
+ if (TREE_CODE (stmt) != MODIFY_EXPR)
+ return NULL;
+ rhs = TREE_OPERAND (stmt, 1);
+
+ /* There are a few classes of things we can't or don't move, some because we
+ don't have code to handle it, some because it's not profitable and some
+ because it's not legal. + + We can't sink things that may be global stores, at least not without
+ calculating a lot more information, because we may cause it to no longer
+ be seen by an external routine that needs it depending on where it gets
+ moved to. + + We don't want to sink loads from memory.
+
+ We can't sink statements that end basic blocks without splitting the
+ incoming edge for the sink location to place it there.
+
+ We can't sink statements that have volatile operands. +
+ We don't want to sink dead code, so anything with 0 immediate uses is not
+ sunk. +
+ */
+ ann = stmt_ann (stmt);
+ if (NUM_VUSES (STMT_VUSE_OPS (stmt)) != 0
+ || stmt_ends_bb_p (stmt)
+ || TREE_SIDE_EFFECTS (rhs)
+ || TREE_CODE (rhs) == EXC_PTR_EXPR
+ || TREE_CODE (rhs) == FILTER_EXPR
+ || is_hidden_global_store (stmt)
+ || ann->has_volatile_ops
+ || num_immediate_uses (imm) == 0)
+ return NULL;
+ + FOR_EACH_SSA_DEF_OPERAND (def_p, stmt, iter, SSA_OP_ALL_DEFS)
+ {
+ tree def = DEF_FROM_PTR (def_p);
+ if (is_global_var (SSA_NAME_VAR (def))
+ || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (def))
+ return NULL;
+ }
+ + FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES)
+ {
+ tree use = USE_FROM_PTR (use_p);
+ if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (use))
+ return NULL;
+ }
+ + /* If all the immediate uses are not in the same place, find the nearest
+ common dominator of all the immediate uses. For PHI nodes, we have to
+ find the nearest common dominator of all of the predecessor blocks, since
+ that is where insertion would have to take place. */
+ if (!all_immediate_uses_same_place (imm))
+ {
+ basic_block commondom = nearest_common_dominator_of_uses (imm);
+ + if (commondom == frombb)
+ return NULL;
+
+ /* Our common dominator has to be dominated by frombb in order to be a
+ trivially safe place to put this statement, since it has multiple
+ uses. */ + if (!dominated_by_p (CDI_DOMINATORS, commondom, frombb))
+ return NULL;
+ + /* It doesn't make sense to move to a dominator that post-dominates
+ frombb, because it means we've just moved it into a path that always
+ executes if frombb executes, instead of reducing the number of
+ executions . */
+ if (dominated_by_p (CDI_POST_DOMINATORS, frombb, commondom))
+ {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, "Not moving store, common dominator post-dominates from block.\n");
+ return NULL;
+ }
+
+ if (commondom == frombb || commondom->loop_depth > frombb->loop_depth)
+ return NULL;
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+ fprintf (dump_file, "Common dominator of all uses is %d\n",
+ commondom->index);
+ }
+ return first_stmt (commondom);
+ }
+
+ use = immediate_use (imm, 0);
+ if (TREE_CODE (use) != PHI_NODE)
+ {
+ sinkbb = bb_for_stmt (use);
+ if (sinkbb == frombb || sinkbb->loop_depth > frombb->loop_depth
+ || sinkbb->loop_father != frombb->loop_father)
+ return NULL; + return use;
+ }
+
+ /* Note that at this point, all uses must be in the same statement, so it
+ doesn't matter which def op we choose. */
+ if (STMT_DEF_OPS (stmt) == NULL)
+ {
+ if (STMT_V_MAY_DEF_OPS (stmt) != NULL)
+ def = V_MAY_DEF_RESULT (STMT_V_MAY_DEF_OPS (stmt), 0);
+ else if (STMT_V_MUST_DEF_OPS (stmt) != NULL)
+ def = V_MUST_DEF_RESULT (STMT_V_MUST_DEF_OPS (stmt), 0);
+ else
+ gcc_unreachable ();
+ }
+ else
+ def = DEF_OP (STMT_DEF_OPS (stmt), 0);
+ + sinkbb = find_bb_for_arg (use, def);
+ if (!sinkbb)
+ return NULL;
+
+ /* This will happen when you have
+ a_3 = PHI <a_13, a_26>
+ + a_26 = V_MAY_DEF <a_3> +
+ If the use is a phi, and is in the same bb as the def, + we can't sink it. */
+
+ if (bb_for_stmt (use) == frombb)
+ return NULL;
+ if (sinkbb == frombb || sinkbb->loop_depth > frombb->loop_depth
+ || sinkbb->loop_father != frombb->loop_father)
+ return NULL;
+
+ return first_stmt (sinkbb);
+}
+
+/* Perform code sinking on BB */
+
+static void
+sink_code_in_bb (basic_block bb)
+{
+ basic_block son;
+ block_stmt_iterator bsi;
+ edge_iterator ei;
+ edge e;
+ + /* If this block doesn't dominate anything, there can't be any place to sink
+ the statements to. */
+ if (first_dom_son (CDI_DOMINATORS, bb) == NULL)
+ goto earlyout;
+
+ /* We can't move things across abnormal edges, so don't try. */
+ FOR_EACH_EDGE (e, ei, bb->succs)
+ if (e->flags & EDGE_ABNORMAL)
+ goto earlyout;
+
+ for (bsi = bsi_last (bb); !bsi_end_p (bsi);)
+ {
+ tree stmt = bsi_stmt (bsi); + block_stmt_iterator tobsi;
+ tree sinkstmt;
+ get_stmt_operands (stmt);
+ + sinkstmt = statement_sink_location (stmt, bb);
+ if (!sinkstmt)
+ {
+ if (!bsi_end_p (bsi))
+ bsi_prev (&bsi);
+ continue;
+ } + if (dump_file)
+ {
+ fprintf (dump_file, "Sinking ");
+ print_generic_expr (dump_file, stmt, TDF_VOPS);
+ fprintf (dump_file, " from bb %d to bb %d\n",
+ bb->index, bb_for_stmt (sinkstmt)->index);
+ }
+ tobsi = bsi_for_stmt (sinkstmt);
+ /* Find the first non-label. */
+ while (!bsi_end_p (tobsi)
+ && TREE_CODE (bsi_stmt (tobsi)) == LABEL_EXPR)
+ bsi_next (&tobsi);
+ + /* If this is the end of the basic block, we need to insert at the end
+ of the basic block. */
+ if (bsi_end_p (tobsi))
+ bsi_move_to_bb_end (&bsi, bb_for_stmt (sinkstmt));
+ else
+ bsi_move_before (&bsi, &tobsi);
+
+ sink_stats.sunk++;
+ if (!bsi_end_p (bsi))
+ bsi_prev (&bsi);
+ + }
+ earlyout:
+ for (son = first_dom_son (CDI_POST_DOMINATORS, bb);
+ son;
+ son = next_dom_son (CDI_POST_DOMINATORS, son))
+ {
+ sink_code_in_bb (son);
+ }
+} +
+/* Perform code sinking.
+ This moves code down the flowgraph when we know it would be
+ profitable to do so, or it wouldn't increase the number of
+ executions of the statement.
+
+ IE given
+ + a_1 = b + c;
+ if (<something>)
+ {
+ }
+ else
+ {
+ foo (&b, &c);
+ a_5 = b + c;
+ }
+ a_6 = PHI (a_5, a_1);
+ USE a_6.
+
+ we'll transform this into:
+
+ if (<something>)
+ {
+ a_1 = b + c;
+ }
+ else
+ {
+ foo (&b, &c);
+ a_5 = b + c;
+ }
+ a_6 = PHI (a_5, a_1);
+ USE a_6.
+
+ Note that this reduces the number of computations of a = b + c to 1
+ when we take the else edge, instead of 2.
+*/
+static void
+execute_sink_code (void)
+{
+ struct loops *loops = loop_optimizer_init (dump_file);
+ connect_infinite_loops_to_exit ();
+ memset (&sink_stats, 0, sizeof (sink_stats));
+ calculate_dominance_info (CDI_DOMINATORS | CDI_POST_DOMINATORS);
+ compute_immediate_uses (TDFA_USE_OPS | TDFA_USE_VOPS, NULL);
+ sink_code_in_bb (EXIT_BLOCK_PTR); + if (dump_file && (dump_flags & TDF_STATS))
+ fprintf (dump_file, "Sunk statements:%d\n", sink_stats.sunk);
+ free_dominance_info (CDI_POST_DOMINATORS);
+ remove_fake_exit_edges ();
+ free_df ();
+ loop_optimizer_finalize (loops, dump_file );
+}
+
+/* Gate and execute functions for code sinking. */
+
+static void
+do_sink (void)
+{
+ execute_sink_code ();
+}
+
+static bool
+gate_sink (void)
+{
+ return flag_tree_sink != 0;
+}
+
+struct tree_opt_pass pass_sink_code =
+{
+ "sink", /* name */
+ gate_sink, /* gate */
+ do_sink, /* execute */
+ NULL, /* sub */
+ NULL, /* next */
+ 0, /* static_pass_number */
+ TV_TREE_SINK, /* tv_id */
+ PROP_no_crit_edges | PROP_cfg
+ | PROP_ssa | PROP_alias, /* properties_required */
+ 0, /* properties_provided */
+ 0, /* properties_destroyed */
+ 0, /* todo_flags_start */
+ TODO_rename_vars | TODO_dump_func | TODO_ggc_collect | TODO_verify_ssa, /* todo_flags_finish */
+ 0 /* letter */
+};
Index: gcc/doc/invoke.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/invoke.texi,v
retrieving revision 1.559
diff -u -p -r1.559 invoke.texi
--- gcc/doc/invoke.texi 29 Nov 2004 09:33:10 -0000 1.559
+++ gcc/doc/invoke.texi 5 Dec 2004 18:16:18 -0000
@@ -265,6 +265,7 @@ Objective-C and Objective-C++ Dialects}.
-fdump-tree-forwprop@r{[}-@var{n}@r{]} @gol
-fdump-tree-copyrename@r{[}-@var{n}@r{]} @gol
-fdump-tree-nrv -fdump-tree-vect @gol
+-fdump-tree-sink @gol
-fdump-tree-sra@r{[}-@var{n}@r{]} @gol
-fdump-tree-fre@r{[}-@var{n}@r{]} @gol
-feliminate-dwarf2-dups -feliminate-unused-debug-types @gol
@@ -319,7 +320,7 @@ Objective-C and Objective-C++ Dialects}.
-fvariable-expansion-in-unroller @gol
-ftree-pre -ftree-ccp -ftree-dce -ftree-loop-optimize @gol
-ftree-loop-linear -ftree-loop-im -ftree-loop-ivcanon -fivopts @gol
--ftree-dominator-opts -ftree-dse -ftree-copyrename @gol
+-ftree-dominator-opts -ftree-dse -ftree-copyrename -ftree-sink @gol
-ftree-ch -ftree-sra -ftree-ter -ftree-lrs -ftree-fre -ftree-vectorize @gol
--param @var{name}=@var{value}
-O -O0 -O1 -O2 -O3 -Os}
@@ -3810,6 +3811,11 @@ made by appending @file{.mudflap} to the
Dump each function after performing scalar replacement of aggregates. The
file name is made by appending @file{.sra} to the source file name.


+@item sink
+@opindex fdump-tree-sink
+Dump each function after performing code sinking. The file name is made
+by appending @file{.sink} to the source file name. +
@item dom
@opindex fdump-tree-dom
Dump each function after applying dominator tree optimizations. The file
@@ -4628,6 +4634,10 @@ that are computed on all paths leading t
This analysis faster than PRE, though it exposes fewer redundancies.
This flag is enabled by default at @option{-O} and higher.


+@item -ftree-sink
+Perform forward store motion  on trees.  This flag is
+enabled by default at @option{-O} and higher.
+
 @item -ftree-ccp
 Perform sparse conditional constant propagation (CCP) on trees.  This flag
 is enabled by default at @option{-O} and higher.
Index: gcc/doc/passes.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/passes.texi,v
retrieving revision 1.47
diff -u -p -r1.47 passes.texi
--- gcc/doc/passes.texi	5 Nov 2004 01:36:56 -0000	1.47
+++ gcc/doc/passes.texi	5 Dec 2004 18:16:18 -0000
@@ -349,6 +349,12 @@ in @file{tree-ssa-dse.c} and is describe
 This pass transforms tail recursion into a loop.  It is located in
 @file{tree-tailcall.c} and is described by @code{pass_tail_recursion}.

+@item Forward store motion
+
+This pass sinks stores and assignments down the flowgraph closer to it's
+use point.  The pass is located in @file{tree-ssa-sink.c} and is
+described by @code{pass_sink_code}
+
 @item Partial redundancy elimination

This pass eliminates partially redundant computations, as well as
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-1.c
===================================================================
RCS file: gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-1.c
diff -N gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-1.c
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-1.c 5 Dec 2004 18:16:25 -0000
@@ -0,0 +1,10 @@
+/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+int
+foo (int a, int b, int c)
+{
+ int x = a * b;
+ return c ? x : a;
+}
+/* We should sink the x = a * b calculation into the branch that returns x. */
+/* { dg-final { scan-tree-dump-times "Sunk statements:1" 1 "sink"} } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-2.c
===================================================================
RCS file: gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-2.c
diff -N gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-2.c
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-2.c 5 Dec 2004 18:16:25 -0000
@@ -0,0 +1,12 @@
+/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+int
+bar (int a, int b, int c)
+{
+ int y = a * b;
+ if (c)
+ y = 12;
+ return y;
+}
+/* We should sink the x = a * b calculation into the else branch */
+/* { dg-final { scan-tree-dump-times "Sunk statements:1" 1 "sink"} } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-3.c
===================================================================
RCS file: gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-3.c
diff -N gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-3.c
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-3.c 5 Dec 2004 18:16:25 -0000
@@ -0,0 +1,15 @@
+/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+extern void foo(int a);
+int
+main (int argc)
+{
+ int a;
+ a = argc + 1;
+ if (argc + 3)
+ {
+ foo (a);
+ }
+}
+/* We should sink the a = argc + 1 calculation into the if branch */
+/* { dg-final { scan-tree-dump-times "Sunk statements:1" 1 "sink"} } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-4.c
===================================================================
RCS file: gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-4.c
diff -N gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-4.c
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-4.c 5 Dec 2004 18:16:25 -0000
@@ -0,0 +1,20 @@
+/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+extern int foo (int *, int *);
+extern int foo2 (int);
+int
+main (int argc)
+{
+ int a, b, c;
+ b = argc + 1;
+ c = argc + 2;
+ a = b + c;
+ if (argc)
+ {
+ foo (&b, &c);
+ a = b + c;
+ }
+ foo2 (a);
+}
+/* We should sink the first a = b + c calculation into the else branch */
+/* { dg-final { scan-tree-dump-times "Sunk statements:1" 1 "sink"} } */



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]