[tree-ssa] Further CCP improvements.

Andrew MacLeod amacleod@redhat.com
Mon Oct 6 19:40:00 GMT 2003


Here's more speedups for CCP.

We were still spending a lot of time doing wasteful things. Now we build
much better default value information for variables, and we build it
right in initialize().

The new default value initialization goes something like this:

- Any stmt which isnt a PHI or MODIFY stmt always has its results marked
varying

- All VDEF results are marked VARYING, and entered into a virtual_var
bitmap.

- Calculate all stmt default values first, then process all the PHIs. If
any phi argument is in the virtual_var bitmap, change the PHIs default
value to VARYING, and add it to the virtual_var bitmap too. We don't
need to process virtual variables, nor build uses info for them.

- Initialize DONT_SIMULATE_AGAIN to 1 if the stmt is VARYING

Now that we have much better default values, the interface to
compute_immediate_uses has been modified too. There is an optional
callback function which is used to determine which variables will have
immediate usage information built for them. We don't build this
information for any variable whose default value is VARYING. (It'll
never change, so it'll never have to update any uses) I experimented
with a bitmap, but the results were a wash, so I left it this way for
now.

So, whats the end result of all this?  Again, I'll use
libjava/interpret.cc as the measuring stick.  

Compile time spent in CCP drops from 5.27 seconds to a much more
sensible 0.78 seconds. We were clearly wasting a colossal amount of
time. As a bonus, time spent in garbage Collection drops from 0.63 to
0.32 seconds, so virtually in half again. (Remember back-in-the-day when
CCP was spending 50 seconds on this file?? yeesh. :-)

A lot less memory is consumed along the way as well. Before this change,
this file was performing a garbage collection which reduced used memory
from 170MB to 110MB. That same collection now reduces used memory from
111MB to 63MB. I presume the highwater mark will be significantly less
on large programs.. This is due mostly to the reduced memory
requirements of not building the immediate uses information for virtual
PHI nodes.

Bootstrap time in CCP drops from 2.8 seconds to 2.6 or so, so it doesn't
affect smaller programs much, but it is a big win on large functions.
Compile time of libstdc++-v3 is reduced modestly.

We miss no CCP opportunities that I can find during a bootstrap of GCC,
it bootstraps, and causes no new regressions, all on x86.

Just checked in.

Andrew

PS. FYI, the time-report now looks something like:

Execution times (seconds)
 garbage collection    :   0.31 ( 1%) usr   0.00 ( 0%) sys   0.31 ( 1%) wall
 cfg construction      :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 cfg cleanup           :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall
 trivially dead code   :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 life analysis         :   0.14 ( 1%) usr   0.00 ( 0%) sys   0.14 ( 1%) wall
 life info update      :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall
 alias analysis        :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall
 register scan         :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 preprocessing         :   0.04 ( 0%) usr   0.03 ( 5%) sys   0.06 ( 0%) wall
 parser                :   0.54 ( 2%) usr   0.08 (13%) sys   1.86 ( 7%) wall
 name lookup           :   0.35 ( 1%) usr   0.24 (39%) sys   0.62 ( 2%) wall
 integration           :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 tree gimplify         :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 1%) wall
 tree CFG construction :   0.13 ( 1%) usr   0.01 ( 2%) sys   0.16 ( 1%) wall
 tree PHI insertion    :   4.04 (17%) usr   0.08 (13%) sys   4.18 (15%) wall
 tree SSA rewrite      :   3.20 (13%) usr   0.00 ( 0%) sys   3.28 (12%) wall
 tree SSA other        :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 dominator optimization:   3.71 (15%) usr   0.01 ( 2%) sys   3.76 (14%) wall
 tree CCP              :   0.78 ( 3%) usr   0.01 ( 2%) sys   0.80 ( 3%) wall
 tree PRE              :   6.56 (27%) usr   0.12 (19%) sys   6.73 (24%) wall
 tree COPYPROP         :   0.75 ( 3%) usr   0.00 ( 0%) sys   0.75 ( 3%) wall
 tree DCE              :   1.09 ( 4%) usr   0.00 ( 0%) sys   1.10 ( 4%) wall
 tree SSA to normal    :   0.31 ( 1%) usr   0.00 ( 0%) sys   0.31 ( 1%) wall
 dominance frontiers   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 expand                :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall
 jump                  :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall
 CSE                   :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall
 global CSE            :   0.13 ( 1%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall
 bypass jumps          :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 CSE 2                 :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall
 branch prediction     :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 combiner              :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall
 regmove               :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 mode switching        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 local alloc           :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 global alloc          :   0.18 ( 1%) usr   0.00 ( 0%) sys   0.20 ( 1%) wall
 reload CSE regs       :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall
 flow 2                :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 if-conversion 2       :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall
 peephole 2            :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 rename registers      :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 scheduling 2          :   0.06 ( 0%) usr   0.01 ( 2%) sys   0.10 ( 0%) wall
 reorder blocks        :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 shorten branches      :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 reg stack             :   1.00 ( 4%) usr   0.00 ( 0%) sys   1.00 ( 4%) wall
 final                 :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 symout                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 rest of compilation   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 TOTAL                 :  24.40             0.62            27.57



2003-10-06  Andrew MacLeod  <amacleod@redhat.com>

	* tree-dfa.c (compute_immediate_uses): Add optional callback.
	(compute_immediate_uses_for_phi): Remove unused parameter. Add optional
	callback to determine if usage info should be calculated for variable.
	(compute_immediate_uses_for_stmt): Add optional callback to determine 
	if usage info should be calculated for variable.
	* tree-flow.h (compute_immediate_uses): Update prototype.
	* tree-ssa-ccp.c (need_imm_uses_for): New. Callback function passed to
	compute_immediate_uses.
	(initialize): Calculate defaults initially, then build reduced
	immediate use information.
	(get_default_value): Non empty stmt's which are not a PHI_NODE or
	a MODIFY_EXPR default to VARYING.


Index: tree-dfa.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Attic/tree-dfa.c,v
retrieving revision 1.1.4.164
diff -c -p -r1.1.4.164 tree-dfa.c
*** tree-dfa.c	25 Sep 2003 19:34:50 -0000	1.1.4.164
--- tree-dfa.c	6 Oct 2003 17:27:51 -0000
*************** static void add_immediate_use (tree, tre
*** 140,147 ****
  static tree find_vars_r (tree *, int *, void *);
  static void add_referenced_var (tree, struct walk_state *);
  static tree get_memory_tag_for (tree);
! static void compute_immediate_uses_for_phi (tree, int);
! static void compute_immediate_uses_for_stmt (tree, int);
  static void add_may_alias (tree, tree);
  static int get_call_flags (tree);
  static void find_hidden_use_vars (tree);
--- 140,147 ----
  static tree find_vars_r (tree *, int *, void *);
  static void add_referenced_var (tree, struct walk_state *);
  static tree get_memory_tag_for (tree);
! static void compute_immediate_uses_for_phi (tree, bool (*)(tree));
! static void compute_immediate_uses_for_stmt (tree, int, bool (*)(tree));
  static void add_may_alias (tree, tree);
  static int get_call_flags (tree);
  static void find_hidden_use_vars (tree);
*************** remove_all_phi_nodes_for (sbitmap vars)
*** 1118,1127 ****
  /*---------------------------------------------------------------------------
  			Dataflow analysis (DFA) routines
  ---------------------------------------------------------------------------*/
! /* Compute immediate uses.  */
  
  void
! compute_immediate_uses (int flags)
  {
    basic_block bb;
    block_stmt_iterator si;
--- 1118,1130 ----
  /*---------------------------------------------------------------------------
  			Dataflow analysis (DFA) routines
  ---------------------------------------------------------------------------*/
! /* Compute immediate uses.  The parameter calc_for is an option function 
!    pointer whichi indicates whether immediate uses information should be
!    calculated for a given SSA variable. If NULL, then information is computed
!    for all variables.  */
  
  void
! compute_immediate_uses (int flags, bool (*calc_for)(tree))
  {
    basic_block bb;
    block_stmt_iterator si;
*************** compute_immediate_uses (int flags)
*** 1131,1144 ****
        tree phi;
  
        for (phi = phi_nodes (bb); phi; phi = TREE_CHAIN (phi))
! 	compute_immediate_uses_for_phi (phi, flags);
  
        for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
! 	compute_immediate_uses_for_stmt (bsi_stmt (si), flags);
      }
  }
  
- 
  /* Helper for compute_immediate_uses.  Check all the USE and/or VUSE
     operands in phi node PHI and add a def-use edge between their
     defining statement and PHI.
--- 1134,1150 ----
        tree phi;
  
        for (phi = phi_nodes (bb); phi; phi = TREE_CHAIN (phi))
! 	compute_immediate_uses_for_phi (phi, calc_for);
  
        for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
!         {
! 	  tree stmt = bsi_stmt (si);
! 	  get_stmt_operands (stmt);
! 	  compute_immediate_uses_for_stmt (stmt, flags, calc_for);
! 	}
      }
  }
  
  /* Helper for compute_immediate_uses.  Check all the USE and/or VUSE
     operands in phi node PHI and add a def-use edge between their
     defining statement and PHI.
*************** compute_immediate_uses (int flags)
*** 1146,1152 ****
     PHI nodes are easy, we only need to look at its arguments.  */
  
  static void
! compute_immediate_uses_for_phi (tree phi, int flags ATTRIBUTE_UNUSED)
  {
    int i;
  
--- 1152,1158 ----
     PHI nodes are easy, we only need to look at its arguments.  */
  
  static void
! compute_immediate_uses_for_phi (tree phi, bool (*calc_for)(tree))
  {
    int i;
  
*************** compute_immediate_uses_for_phi (tree phi
*** 1159,1165 ****
      {
        tree arg = PHI_ARG_DEF (phi, i);
  
!       if (TREE_CODE (arg) == SSA_NAME)
  	{ 
  	  tree imm_rdef_stmt = SSA_NAME_DEF_STMT (PHI_ARG_DEF (phi, i));
  	  if (!IS_EMPTY_STMT (imm_rdef_stmt))
--- 1165,1171 ----
      {
        tree arg = PHI_ARG_DEF (phi, i);
  
!       if (TREE_CODE (arg) == SSA_NAME && (!calc_for || calc_for (arg)))
  	{ 
  	  tree imm_rdef_stmt = SSA_NAME_DEF_STMT (PHI_ARG_DEF (phi, i));
  	  if (!IS_EMPTY_STMT (imm_rdef_stmt))
*************** compute_immediate_uses_for_phi (tree phi
*** 1174,1180 ****
     and STMT.  */
  
  static void
! compute_immediate_uses_for_stmt (tree stmt, int flags)
  {
    size_t i;
    varray_type ops;
--- 1180,1186 ----
     and STMT.  */
  
  static void
! compute_immediate_uses_for_stmt (tree stmt, int flags, bool (*calc_for)(tree))
  {
    size_t i;
    varray_type ops;
*************** compute_immediate_uses_for_stmt (tree st
*** 1186,1202 ****
  #endif
  
    /* Look at USE_OPS or VUSE_OPS according to FLAGS.  */
-   get_stmt_operands (stmt);
- 
    if ((flags & TDFA_USE_OPS) && use_ops (stmt))
      {
        ops = use_ops (stmt);
        for (i = 0; i < VARRAY_ACTIVE_SIZE (ops); i++)
  	{
  	  tree *use_p = VARRAY_TREE_PTR (ops, i);
! 	  tree imm_rdef_stmt = SSA_NAME_DEF_STMT (*use_p);
! 	  if (!IS_EMPTY_STMT (imm_rdef_stmt))
! 	    add_immediate_use (imm_rdef_stmt, stmt);
  	}
      }
  
--- 1192,1206 ----
  #endif
  
    /* Look at USE_OPS or VUSE_OPS according to FLAGS.  */
    if ((flags & TDFA_USE_OPS) && use_ops (stmt))
      {
        ops = use_ops (stmt);
        for (i = 0; i < VARRAY_ACTIVE_SIZE (ops); i++)
  	{
  	  tree *use_p = VARRAY_TREE_PTR (ops, i);
! 	  tree imm_stmt = SSA_NAME_DEF_STMT (*use_p);
! 	  if (!IS_EMPTY_STMT (imm_stmt) && (!calc_for || calc_for (*use_p)))
! 	    add_immediate_use (imm_stmt, stmt);
  	}
      }
  
*************** compute_immediate_uses_for_stmt (tree st
*** 1207,1213 ****
  	{
  	  tree vuse = VARRAY_TREE (ops, i);
  	  tree imm_rdef_stmt = SSA_NAME_DEF_STMT (vuse);
! 	  if (!IS_EMPTY_STMT (imm_rdef_stmt))
  	    add_immediate_use (imm_rdef_stmt, stmt);
  	}
      }
--- 1211,1217 ----
  	{
  	  tree vuse = VARRAY_TREE (ops, i);
  	  tree imm_rdef_stmt = SSA_NAME_DEF_STMT (vuse);
! 	  if (!IS_EMPTY_STMT (imm_rdef_stmt) && (!calc_for || calc_for (vuse)))
  	    add_immediate_use (imm_rdef_stmt, stmt);
  	}
      }
Index: tree-flow.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Attic/tree-flow.h,v
retrieving revision 1.1.4.123
diff -c -p -r1.1.4.123 tree-flow.h
*** tree-flow.h	25 Sep 2003 11:12:50 -0000	1.1.4.123
--- tree-flow.h	6 Oct 2003 17:27:52 -0000
*************** extern void remove_decl (tree, tree);
*** 462,468 ****
  extern tree *find_decl_location (tree, tree);
  extern void compute_may_aliases (tree);
  extern void compute_reached_uses (int);
! extern void compute_immediate_uses (int);
  extern void compute_reaching_defs (int);
  extern void dump_alias_info (FILE *);
  extern void debug_alias_info (void);
--- 462,468 ----
  extern tree *find_decl_location (tree, tree);
  extern void compute_may_aliases (tree);
  extern void compute_reached_uses (int);
! extern void compute_immediate_uses (int, bool (*)(tree));
  extern void compute_reaching_defs (int);
  extern void dump_alias_info (FILE *);
  extern void debug_alias_info (void);
Index: tree-ssa-ccp.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Attic/tree-ssa-ccp.c,v
retrieving revision 1.1.2.97
diff -c -p -r1.1.2.97 tree-ssa-ccp.c
*** tree-ssa-ccp.c	25 Sep 2003 11:12:50 -0000	1.1.2.97
--- tree-ssa-ccp.c	6 Oct 2003 17:27:52 -0000
*************** static tree get_strlen (tree);
*** 126,131 ****
--- 126,132 ----
  static inline bool cfg_blocks_empty_p (void);
  static void cfg_blocks_add (basic_block);
  static basic_block cfg_blocks_get (void);
+ static bool need_imm_uses_for (tree var);
  
  /* Debugging dumps.  */
  static FILE *dump_file;
*************** widen_bitfield (tree val, tree field, tr
*** 1035,1040 ****
--- 1036,1051 ----
  }
  
  
+ /* Function indicating whether we ought to include information for 'var'
+    when calculating immediate uses.  */
+ 
+ static bool
+ need_imm_uses_for (tree var)
+ {
+   return get_value (var)->lattice_val != VARYING;
+ }
+ 
+ 
  /* Initialize local data structures and worklists for CCP.  */
  
  static void
*************** initialize (void)
*** 1042,1053 ****
  {
    edge e;
    basic_block bb;
! 
!   /* Compute immediate uses.  */
!   compute_immediate_uses (TDFA_USE_OPS);
! 
!   if (dump_file && dump_flags & TDF_DETAILS)
!     dump_immediate_uses (dump_file);
  
    /* Worklist of SSA edges.  */
    VARRAY_TREE_INIT (ssa_edges, 20, "ssa_edges");
--- 1053,1059 ----
  {
    edge e;
    basic_block bb;
!   sbitmap virtual_var;
  
    /* Worklist of SSA edges.  */
    VARRAY_TREE_INIT (ssa_edges, 20, "ssa_edges");
*************** initialize (void)
*** 1061,1082 ****
    value_vector = (value *) xmalloc (next_ssa_version * sizeof (value));
    memset (value_vector, 0, next_ssa_version * sizeof (value));
  
!   /* Initialize simulation flags for PHI nodes, statements and edges.  */
    FOR_EACH_BB (bb)
      {
        block_stmt_iterator i;
!       tree phi;
! 
!       for (phi = phi_nodes (bb); phi; phi = TREE_CHAIN (phi))
! 	DONT_SIMULATE_AGAIN (phi) = 0;
  
        for (i = bsi_start (bb); !bsi_end_p (i); bsi_next (&i))
! 	DONT_SIMULATE_AGAIN (bsi_stmt (i)) = 0;
  
        for (e = bb->succ; e; e = e->succ_next)
  	e->flags &= ~EDGE_EXECUTABLE;
      }
  
  
    VARRAY_BB_INIT (cfg_blocks, 20, "cfg_blocks");
  
--- 1067,1157 ----
    value_vector = (value *) xmalloc (next_ssa_version * sizeof (value));
    memset (value_vector, 0, next_ssa_version * sizeof (value));
  
!   /* 1 if ssa variable is used in a virtual variable context.  */
!   virtual_var = sbitmap_alloc (next_ssa_version);
!   sbitmap_zero (virtual_var);
! 
!   /* Initialize default values and simulation flags for PHI nodes, statements 
!      and edges.  */
    FOR_EACH_BB (bb)
      {
        block_stmt_iterator i;
!       tree stmt;
!       varray_type ops;
!       size_t x;
!       int vary;
  
+       /* Get the default value for each definition.  */
        for (i = bsi_start (bb); !bsi_end_p (i); bsi_next (&i))
!         {
! 	  vary = 0;
! 	  stmt = bsi_stmt (i);
! 	  get_stmt_operands (stmt);
! 	  ops = def_ops (stmt);
! 	  if (ops)
! 	    for (x = 0; x < VARRAY_ACTIVE_SIZE (ops); x++)
! 	      {
! 		tree *def_p = VARRAY_TREE_PTR (ops, x);
! 		if (get_value (*def_p)->lattice_val == VARYING)
! 		  vary = 1;
! 	      }
! 	  DONT_SIMULATE_AGAIN (stmt) = vary;
! 
! 	  /* Mark all VDEF operands VARYING.  */
! 	  ops = vdef_ops (stmt);
! 	  if (ops)
! 	    {
! 	      for (x = 0; x < VARRAY_ACTIVE_SIZE (ops); x++)
! 	        {
! 		  tree res = VDEF_RESULT (VARRAY_TREE (ops, x));
! 		  get_value (res)->lattice_val = VARYING;
! 		  SET_BIT (virtual_var, SSA_NAME_VERSION (res));
! 		}
! 	    }
! 	}
  
        for (e = bb->succ; e; e = e->succ_next)
  	e->flags &= ~EDGE_EXECUTABLE;
      }
  
+   /* Now process PHI nodes.  */
+   FOR_EACH_BB (bb)
+     {
+       tree phi, var;
+       int x;
+       for (phi = phi_nodes (bb); phi; phi = TREE_CHAIN (phi))
+         {
+ 	  value *val;
+ 	  val = get_value (PHI_RESULT (phi));
+ 	  if (val->lattice_val != VARYING)
+ 	    {
+ 	      for (x = 0; x < PHI_NUM_ARGS (phi); x++)
+ 	        {
+ 		  var = PHI_ARG_DEF (phi, x);
+ 		  /* If one argument is virtual, the result is virtual, and
+ 		     therefore varying.  */
+ 		  if (TREE_CODE (var) == SSA_NAME)
+ 		    {
+ 		      if (TEST_BIT (virtual_var, SSA_NAME_VERSION (var)))
+ 		        {
+ 			  val->lattice_val = VARYING;
+ 			  SET_BIT (virtual_var, 
+ 				   SSA_NAME_VERSION (PHI_RESULT (phi)));
+ 			  break;
+ 			}
+ 		    }
+ 		}
+ 	    }
+ 	  DONT_SIMULATE_AGAIN (phi) = ((val->lattice_val == VARYING) ? 1 : 0);
+ 	}
+     }
+ 
+   sbitmap_free (virtual_var);
+   /* Compute immediate uses for variables we care about.  */
+   compute_immediate_uses (TDFA_USE_OPS, need_imm_uses_for);
+ 
+   if (dump_file && dump_flags & TDF_DETAILS)
+     dump_immediate_uses (dump_file);
  
    VARRAY_BB_INIT (cfg_blocks, 20, "cfg_blocks");
  
*************** get_default_value (tree var)
*** 1551,1556 ****
--- 1626,1643 ----
  	{
  	  val.lattice_val = CONSTANT;
  	  val.const_val = DECL_INITIAL (sym);
+ 	}
+     }
+   else
+     {
+       enum tree_code code;
+       tree stmt = SSA_NAME_DEF_STMT (var);
+ 
+       if (!IS_EMPTY_STMT (stmt))
+         {
+ 	  code = TREE_CODE (stmt);
+ 	  if (code != MODIFY_EXPR && code != PHI_NODE)
+ 	    val.lattice_val = VARYING;
  	}
      }
  





More information about the Gcc-patches mailing list