This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[tree-ssa] fully vs semi-pruned SSA PHI insertions


As has been touched upon here a few times recently, there are times
when using fully-pruned instead of semi-pruned PHI insertions is
critical for the stability of the compiler.  A great example of this
is compile/20001226-1.c if you fix the aliasing code to disambiguate
all the memory references (which will be my next batch of patches).

The biggest "problem" with fully-pruned form is the need to compute
global life information.  FWIW, switching to fully-pruned exclusively
costs about 1% in compilation time.  It's not terrible, but why slow
things down when it really isn't necessary.  Note that sometimes the
cost of computing global life information is sometimes recovered by
faster optimizers (fewer PHI nodes to examine).

Attached is a patch which allows the compiler to dynamically switch
between semi-pruned and fully-pruned on a per-variable basis.  The
decision to switch is based on a relatively simple heuristic -- count
the number of phi alternatives that a semi-pruned scheme would create.
If the number of phi alternatives for a given variable rises above the
threshhold value, then switch to fully pruned.

This basically allows us to use semi-pruned form or fully pruned form,
whichever is appropriate for a given variable.  In fact, this capability
allows us to produce a more robust compiler and improve overall compile
time over using fully-pruned or semi-pruned exclusively.

I did a variety of tests to determine what a reasonable threshhold for
switching to fully-pruned would be.  The highlights are outlined below.
The times are for building the components of cc1 at -O2 on my linux box.
The threshhold represents the number of phi alternatives for a given
variable at which we switch from semi-pruned to fully-pruned form.

Threshhold             User Time     System Time
    2                     708.75        113.69
    4                     706.12        114.01
    8                     703.61        112.58
   16                     701.38        112.34
   32                     695.92        112.60
   64                     703.26        112.31
  128                     702.32        112.64


Note that at a threshhold of 2, we are effectively using fully-pruned form
for all PHI insertions and it represents the worst overall compile-time
performance.

I don't have numbers handy which show the performance of a strictly
semi-pruned (I'd have to re-do all the numbers, which takes a day or so
to deal with the addition of DCE and the merge).  However, based on 
previous tests, I'd expect semi-pruned to come in at around 706 seconds.

Similarly I don't have baseline numbers for the compiler without any of
these changes.  However, based on earlier runs before the enablement of
DCE and the merge, I would expect the baseline compiler to clock in
at around 704 seconds.

So what does this tell us?  First, there's a small cost for computing
the heuristic (roughly 2 seconds for this set of tests).   Second,
there's a range of thresholds (8-128) where we can recover that cost
as well as the cost of computing global life information.  Third, there
some values for the threshold (16-32) where we do better than the
compiler without any of these changes -- ie, the compiler gets faster
and more robust at the same time.

Anyway, here's the changes.  Bootstrapped, regression tested, performance
tested, folded, mutilated, etc. 

	* ssa.c (compute_dominance_frontiers_1): Use a sparse bitmap
	for the frontiers.
	(compute_dominance_frontiers): Corresponding changes.
	(convert_to_ssa): Similarly.  Convert the sparse bitmap to
	a simple bitmap to avoid lots of collateral damage.
	* ssa.h (compute_dominance_frontiers): Update prototype.
	* tree-ssa.c (added, in_work): Kill, no longer needed.
	(struct def_blocks_d): Add new bitmap (livein_blocks).
	(rewrite_into_ssa): Make dominance frontiers be a sparse
	bitmap instead of a simple bitmap.  Rename the "nonlocals"
	simple bitmap to "globals".  Pass it into mark_def_sites.
	(compute_global_livein): New function.
	(mark_def_sites): Also keep track of variables which are
	used before they are set.  Allow caller to allocate and
	pass in a simple bitmap for global variables.  Process
	items defined in the statement last.
	(set_def_block): Also allocate bitmap for globals.
	(set_livein_block): New function.
	(def_blocks_free): Free def_blocks correctly.  Also free
	livein_blocks.
	(debug_def_blocks_r): Also dump the livein_blocks.
	(insert_phi_nodes): Simplify now that we don't need the
	added and in_work varrays.  Accept DFS as a sparse bitmap
	instead of a simple bitmap.
	(insert_phi_nodes_for): Rework significantly.  Pre-compute all
	the insertion points for semi-pruned form.  While computing those
	insertion points keep track of how many phi vector entries
	would be needed at those insertion points.  When the number of
	entries gets large (32), compute global life information and
	use that to further pruned the number of PHI insertion points
	necessary.
	
Index: ssa.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/ssa.c,v
retrieving revision 1.52.2.7
diff -c -3 -p -r1.52.2.7 ssa.c
*** ssa.c	29 Jan 2003 17:26:58 -0000	1.52.2.7
--- ssa.c	10 Feb 2003 18:14:23 -0000
*************** struct rename_context;
*** 168,174 ****
  static inline rtx * phi_alternative
    PARAMS ((rtx, int));
  static void compute_dominance_frontiers_1
!   PARAMS ((sbitmap *frontiers, dominance_info idom, int bb,
  	   sbitmap done, basic_block * cached_idoms));
  static void find_evaluations_1
    PARAMS ((rtx dest, rtx set, void *data));
--- 168,174 ----
  static inline rtx * phi_alternative
    PARAMS ((rtx, int));
  static void compute_dominance_frontiers_1
!   PARAMS ((bitmap *frontiers, dominance_info idom, int bb,
  	   sbitmap done, basic_block * cached_idoms));
  static void find_evaluations_1
    PARAMS ((rtx dest, rtx set, void *data));
*************** find_evaluations (evals, nregs)
*** 519,525 ****
  
  static void
  compute_dominance_frontiers_1 (frontiers, idom, bb, done, cached_idoms)
!      sbitmap *frontiers;
       dominance_info idom;
       int bb;
       sbitmap done;
--- 519,525 ----
  
  static void
  compute_dominance_frontiers_1 (frontiers, idom, bb, done, cached_idoms)
!      bitmap *frontiers;
       dominance_info idom;
       int bb;
       sbitmap done;
*************** compute_dominance_frontiers_1 (frontiers
*** 555,561 ****
        if (e->dest == EXIT_BLOCK_PTR)
  	continue;
        if (cached_idoms[e->dest->index]->index != bb)
! 	SET_BIT (frontiers[bb], e->dest->index);
      }
  
    /* Find blocks conforming to rule (2).  */
--- 555,561 ----
        if (e->dest == EXIT_BLOCK_PTR)
  	continue;
        if (cached_idoms[e->dest->index]->index != bb)
!         bitmap_set_bit (frontiers[bb], e->dest->index);
      }
  
    /* Find blocks conforming to rule (2).  */
*************** compute_dominance_frontiers_1 (frontiers
*** 564,573 ****
        c = dominated[i];
        {
  	int x;
! 	EXECUTE_IF_SET_IN_SBITMAP (frontiers[c->index], 0, x,
  	  {
  	    if (cached_idoms[BASIC_BLOCK (x)->index]->index != bb)
! 	      SET_BIT (frontiers[bb], x);
  	  });
        }
      }
--- 564,573 ----
        c = dominated[i];
        {
  	int x;
! 	EXECUTE_IF_SET_IN_BITMAP (frontiers[c->index], 0, x,
  	  {
  	    if (cached_idoms[BASIC_BLOCK (x)->index]->index != bb)
! 	      bitmap_set_bit (frontiers[bb], x);
  	  });
        }
      }
*************** compute_dominance_frontiers_1 (frontiers
*** 578,584 ****
  
  void
  compute_dominance_frontiers (frontiers, idom)
!      sbitmap *frontiers;
       dominance_info idom;
  {
    basic_block bb, *cached_idoms;
--- 578,584 ----
  
  void
  compute_dominance_frontiers (frontiers, idom)
!      bitmap *frontiers;
       dominance_info idom;
  {
    basic_block bb, *cached_idoms;
*************** compute_dominance_frontiers (frontiers, 
*** 586,592 ****
  
    timevar_push (TV_DOM_FRONTIERS);
  
-   sbitmap_vector_zero (frontiers, last_basic_block);
    sbitmap_zero (done);
  
    cached_idoms = xmalloc (n_basic_blocks * sizeof (bb));
--- 586,591 ----
*************** convert_to_ssa ()
*** 1195,1206 ****
  
    /* Dominator bitmaps.  */
    sbitmap *dfs;
    sbitmap *idfs;
  
    /* Element I is the immediate dominator of block I.  */
    dominance_info idom;
  
!   int nregs;
  
    basic_block bb;
  
--- 1194,1206 ----
  
    /* Dominator bitmaps.  */
    sbitmap *dfs;
+   bitmap *sparse_dfs;
    sbitmap *idfs;
  
    /* Element I is the immediate dominator of block I.  */
    dominance_info idom;
  
!   int i, nregs;
  
    basic_block bb;
  
*************** convert_to_ssa ()
*** 1225,1232 ****
  
    /* Compute dominance frontiers.  */
  
    dfs = sbitmap_vector_alloc (last_basic_block, last_basic_block);
!   compute_dominance_frontiers (dfs, idom);
  
    if (rtl_dump_file)
      {
--- 1225,1255 ----
  
    /* Compute dominance frontiers.  */
  
+   sparse_dfs = (bitmap *) xmalloc (sizeof (bitmap) * n_basic_blocks);
+   for (i = 0; i < n_basic_blocks; i++)
+     sparse_dfs[i] = BITMAP_XMALLOC ();
+   compute_dominance_frontiers (sparse_dfs, idom);
+ 
+   /* Now convert the sparse bitmap into an sbitmap.  Long term we
+      need to convert the rest of this code to use sparse bitmaps
+      when appropriate.  */
    dfs = sbitmap_vector_alloc (last_basic_block, last_basic_block);
!   sbitmap_vector_zero (dfs, last_basic_block);
! 
!   for (i = 0; i < n_basic_blocks; i++)
!     {
!       unsigned int bb_index;
! 
!       EXECUTE_IF_SET_IN_BITMAP (sparse_dfs[i], 0, bb_index,
! 	{
! 	  SET_BIT (dfs[i], bb_index);
! 	});
!     }
! 
!   /* Release the SPARSE_DFS.  */
!   for (i = 0; i < n_basic_blocks; i++)
!     BITMAP_XFREE (sparse_dfs[i]);
!   free (sparse_dfs);
  
    if (rtl_dump_file)
      {
Index: ssa.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/ssa.h,v
retrieving revision 1.7.2.1
diff -c -3 -p -r1.7.2.1 ssa.h
*** ssa.h	24 Jun 2002 21:38:00 -0000	1.7.2.1
--- ssa.h	10 Feb 2003 18:14:23 -0000
*************** typedef int (*successor_phi_fn)         
*** 27,33 ****
  extern int for_each_successor_phi       PARAMS ((basic_block bb,
  						 successor_phi_fn,
  						 void *));
! void compute_dominance_frontiers	PARAMS ((sbitmap *frontiers,
  						 dominance_info idom));
  extern int remove_phi_alternative	PARAMS ((rtx, basic_block));
  
--- 27,33 ----
  extern int for_each_successor_phi       PARAMS ((basic_block bb,
  						 successor_phi_fn,
  						 void *));
! void compute_dominance_frontiers	PARAMS ((bitmap *frontiers,
  						 dominance_info idom));
  extern int remove_phi_alternative	PARAMS ((rtx, basic_block));
  
Index: tree-ssa.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Attic/tree-ssa.c,v
retrieving revision 1.1.4.51
diff -c -3 -p -r1.1.4.51 tree-ssa.c
*** tree-ssa.c	10 Feb 2003 12:27:03 -0000	1.1.4.51
--- tree-ssa.c	10 Feb 2003 18:14:24 -0000
*************** unsigned int next_ssa_version;
*** 62,71 ****
  FILE *tree_ssa_dump_file;
  int tree_ssa_dump_flags;
  
! /* Arrays used to keep track of where to insert PHI nodes for variables
!     definitions (see insert_phi_nodes).  */
! static GTY (()) varray_type added = NULL;
! static GTY (()) varray_type in_work = NULL;
  static GTY (()) varray_type work_stack = NULL;
  
  /* Each entry in DEF_BLOCKS is a bitmap B for a variable VAR.  If B(J) is
--- 62,68 ----
  FILE *tree_ssa_dump_file;
  int tree_ssa_dump_flags;
  
! /* Workstack for computing PHI node insertion points.  */
  static GTY (()) varray_type work_stack = NULL;
  
  /* Each entry in DEF_BLOCKS is a bitmap B for a variable VAR.  If B(J) is
*************** struct def_blocks_d
*** 81,86 ****
--- 78,84 ----
  {
    tree var;
    bitmap def_blocks;
+   bitmap livein_blocks;
  };
  
  /* Hash table to store the current reaching definition for every variable in
*************** struct currdef_d
*** 99,114 ****
  /* Local functions.  */
  static void init_tree_ssa		PARAMS ((void));
  static void delete_tree_ssa		PARAMS ((tree));
! static sbitmap mark_def_sites		PARAMS ((dominance_info));
  static void set_def_block		PARAMS ((tree, basic_block));
! static void insert_phi_nodes		PARAMS ((sbitmap *, sbitmap));
  static void rewrite_block		PARAMS ((basic_block));
  static void rewrite_stmts		PARAMS ((basic_block, varray_type *));
  static void rewrite_stmt		PARAMS ((tree, varray_type *));
  static inline void rewrite_operand	PARAMS ((tree *));
  static void register_new_def		PARAMS ((tree, varray_type *));
! static void insert_phi_nodes_for	PARAMS ((tree, sbitmap *));
! static void add_phi_node 		PARAMS ((basic_block, tree));
  static tree remove_annotations_r	PARAMS ((tree *, int *, void *));
  static inline tree currdef_for		PARAMS ((tree, int));
  static inline void set_currdef_for	PARAMS ((tree, tree));
--- 97,113 ----
  /* Local functions.  */
  static void init_tree_ssa		PARAMS ((void));
  static void delete_tree_ssa		PARAMS ((tree));
! static void mark_def_sites		PARAMS ((dominance_info, sbitmap));
! static void compute_global_livein	PARAMS ((bitmap, bitmap));
  static void set_def_block		PARAMS ((tree, basic_block));
! static void set_livein_block		PARAMS ((tree, basic_block));
! static void insert_phi_nodes		PARAMS ((bitmap *, sbitmap));
  static void rewrite_block		PARAMS ((basic_block));
  static void rewrite_stmts		PARAMS ((basic_block, varray_type *));
  static void rewrite_stmt		PARAMS ((tree, varray_type *));
  static inline void rewrite_operand	PARAMS ((tree *));
  static void register_new_def		PARAMS ((tree, varray_type *));
! static void insert_phi_nodes_for	PARAMS ((tree, bitmap *));
  static tree remove_annotations_r	PARAMS ((tree *, int *, void *));
  static inline tree currdef_for		PARAMS ((tree, int));
  static inline void set_currdef_for	PARAMS ((tree, tree));
*************** void
*** 208,216 ****
  rewrite_into_ssa (fndecl)
       tree fndecl;
  {
!   sbitmap *dfs;
!   sbitmap nonlocal_vars;
    dominance_info idom;
    
    timevar_push (TV_TREE_SSA_OTHER);
  
--- 207,216 ----
  rewrite_into_ssa (fndecl)
       tree fndecl;
  {
!   bitmap *dfs;
!   sbitmap globals;
    dominance_info idom;
+   int i;
    
    timevar_push (TV_TREE_SSA_OTHER);
  
*************** rewrite_into_ssa (fndecl)
*** 222,238 ****
  
    /* Compute immediate dominators and dominance frontiers.  */
    idom = calculate_dominance_info (CDI_DOMINATORS);
!   dfs = sbitmap_vector_alloc (last_basic_block, last_basic_block);
    compute_dominance_frontiers (dfs, idom);
  
    /* Compute aliasing information.  */
    compute_may_aliases ();
  
    /* Find variable references and mark definition sites.  */
!   nonlocal_vars = mark_def_sites (idom);
  
    /* Insert PHI nodes at dominance frontiers of definition blocks.  */
!   insert_phi_nodes (dfs, nonlocal_vars);
  
    /* Rewrite all the basic blocks in the program.  */
    timevar_push (TV_TREE_SSA_REWRITE_BLOCKS);
--- 222,243 ----
  
    /* Compute immediate dominators and dominance frontiers.  */
    idom = calculate_dominance_info (CDI_DOMINATORS);
!   dfs = (bitmap *) xmalloc (n_basic_blocks * sizeof (bitmap *));
!   for (i = 0; i < n_basic_blocks; i++)
!     dfs[i] = BITMAP_XMALLOC ();
    compute_dominance_frontiers (dfs, idom);
  
    /* Compute aliasing information.  */
    compute_may_aliases ();
  
+   globals = sbitmap_alloc (num_referenced_vars);
+   sbitmap_zero (globals);
+ 
    /* Find variable references and mark definition sites.  */
!   mark_def_sites (idom, globals);
  
    /* Insert PHI nodes at dominance frontiers of definition blocks.  */
!   insert_phi_nodes (dfs, globals);
  
    /* Rewrite all the basic blocks in the program.  */
    timevar_push (TV_TREE_SSA_REWRITE_BLOCKS);
*************** rewrite_into_ssa (fndecl)
*** 240,247 ****
    timevar_pop (TV_TREE_SSA_REWRITE_BLOCKS);
  
    /* Free allocated memory.  */
!   free (nonlocal_vars);
!   sbitmap_vector_free (dfs);
    free_dominance_info (idom);
    htab_delete (def_blocks);
    htab_delete (currdefs);
--- 245,254 ----
    timevar_pop (TV_TREE_SSA_REWRITE_BLOCKS);
  
    /* Free allocated memory.  */
!   for (i = 0; i < n_basic_blocks; i++)
!     BITMAP_XFREE (dfs[i]);
!   free (dfs);
!   free (globals);
    free_dominance_info (idom);
    htab_delete (def_blocks);
    htab_delete (currdefs);
*************** rewrite_into_ssa (fndecl)
*** 260,265 ****
--- 267,319 ----
    timevar_pop (TV_TREE_SSA_OTHER);
  }
  
+ /* Compute global livein information given the set of blocks where
+    an object is locally live at the start of the block (LIVEIN)
+    and the set of blocks where the object is defined (DEF_BLOCKS).
+ 
+    Note: This routine augments the existing local livein information
+    to include global livein.  Ie, it modifies the underlying bitmap
+    for LIVEIN.  */
+ 
+ static void
+ compute_global_livein (livein, def_blocks)
+      bitmap livein;
+      bitmap def_blocks;
+ {
+   basic_block bb, *worklist, *tos;
+ 
+   tos = worklist
+     = (basic_block *) xmalloc (sizeof (basic_block) * (n_basic_blocks + 1));
+ 
+   FOR_EACH_BB (bb)
+     {
+       if (bitmap_bit_p (livein, bb->index))
+ 	*tos++ = bb;
+     }
+ 
+   while (tos != worklist)
+     {
+       edge e;
+ 
+       bb = *--tos;
+       for (e = bb->pred; e; e = e->pred_next)
+ 	{
+ 	  basic_block bb = e->src;
+ 	  int bb_index = bb->index;
+ 
+ 	  if (bb != ENTRY_BLOCK_PTR
+ 	      && ! bitmap_bit_p (livein, bb_index)
+ 	      && ! bitmap_bit_p (def_blocks, bb_index))
+ 	    {
+ 	      *tos++ = bb;
+ 	      bitmap_set_bit (livein, bb_index);
+ 	    }
+ 	}
+     }
+   free (worklist);
+ }
+ 
+ 
  
  /* Look for variable references in every block of the flowgraph, compute
     aliasing information and collect definition sites for every variable.
*************** rewrite_into_ssa (fndecl)
*** 273,290 ****
     This information is used to reduce the number of PHI nodes
     we create.  */
  
! static sbitmap
! mark_def_sites (idom)
       dominance_info idom;
  {
    basic_block bb;
    block_stmt_iterator si;
!   sbitmap nonlocal_vars;
!   sbitmap killed_vars;
  
!   nonlocal_vars = sbitmap_alloc (num_referenced_vars);
!   sbitmap_zero (nonlocal_vars);
!   killed_vars = sbitmap_alloc (num_referenced_vars);
  
    /* Mark all the blocks that have definitions for each variable referenced
       in the function.  */
--- 327,342 ----
     This information is used to reduce the number of PHI nodes
     we create.  */
  
! static void
! mark_def_sites (idom, globals)
       dominance_info idom;
+      sbitmap globals;
  {
    basic_block bb;
    block_stmt_iterator si;
!   sbitmap kills;
  
!   kills = sbitmap_alloc (num_referenced_vars);
  
    /* Mark all the blocks that have definitions for each variable referenced
       in the function.  */
*************** mark_def_sites (idom)
*** 296,305 ****
        if (idom_bb)
  	add_dom_child (idom_bb, bb);
  
!       /* We're interested in finding the variables which are used before
!          they are defined in this block.  So at the start of each block
! 	 zero out KILLED_VARS.  */
!       sbitmap_zero (killed_vars);
  
        for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
  	{
--- 348,354 ----
        if (idom_bb)
  	add_dom_child (idom_bb, bb);
  
!       sbitmap_zero (kills);
  
        for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
  	{
*************** mark_def_sites (idom)
*** 321,328 ****
  	      tree *use = VARRAY_GENERIC_PTR (ops, i);
  	      int uid = var_ann (*use)->uid;
  
! 	      if (!TEST_BIT (killed_vars, uid))
! 	        SET_BIT (nonlocal_vars, uid);
  	    }
  	  
  	  /* Similarly for virtual uses.  */
--- 370,380 ----
  	      tree *use = VARRAY_GENERIC_PTR (ops, i);
  	      int uid = var_ann (*use)->uid;
  
! 	      if (! TEST_BIT (kills, uid))
! 		{
! 	          SET_BIT (globals, uid);
! 		  set_livein_block (*use, bb);
! 		}
  	    }
  	  
  	  /* Similarly for virtual uses.  */
*************** mark_def_sites (idom)
*** 332,347 ****
  	      tree use = VARRAY_GENERIC_PTR (ops, i);
  	      int uid = var_ann (use)->uid;
  
! 	      if (!TEST_BIT (killed_vars, uid))
! 	        SET_BIT (nonlocal_vars, uid);
! 	    }
! 
! 	  /* Now process the definition made by this statement.  */
! 	  dest = def_op (stmt);
! 	  if (dest)
! 	    {
! 	      set_def_block (*dest, bb);
! 	      SET_BIT (killed_vars, var_ann (*dest)->uid);
  	    }
  
  	  /* Note that virtual definitions are irrelevant for computing
--- 384,394 ----
  	      tree use = VARRAY_GENERIC_PTR (ops, i);
  	      int uid = var_ann (use)->uid;
  
! 	      if (! TEST_BIT (kills, uid))
! 	        {
! 	          SET_BIT (globals, uid);
! 		  set_livein_block (use, bb);
! 		}
  	    }
  
  	  /* Note that virtual definitions are irrelevant for computing
*************** mark_def_sites (idom)
*** 353,370 ****
  	  for (i = 0; ops && i < VARRAY_ACTIVE_SIZE (ops); i++)
  	    {
  	      tree vdef = VARRAY_TREE (ops, i);
! 	      int uid = var_ann (VDEF_OP (vdef))->uid;
  
  	      set_def_block (VDEF_RESULT (vdef), bb);
! 	      if (!TEST_BIT (killed_vars, uid))
! 		SET_BIT (nonlocal_vars, uid);
  	    }
  	}
      }
  
!   free (killed_vars);
! 
!   return nonlocal_vars;
  }
  
  
--- 400,429 ----
  	  for (i = 0; ops && i < VARRAY_ACTIVE_SIZE (ops); i++)
  	    {
  	      tree vdef = VARRAY_TREE (ops, i);
! 	      tree vdef_op = VDEF_OP (vdef);
! 	      int uid = var_ann (vdef_op)->uid;
  
  	      set_def_block (VDEF_RESULT (vdef), bb);
! 	      if (!TEST_BIT (kills, uid))
! 		{
! 		  SET_BIT (globals, uid);
! 	          set_livein_block (vdef_op, bb);
! 		}
! 
! 	    }
! 
! 	  /* Now process the definition made by this statement.  */
! 	  dest = def_op (stmt);
! 	  if (dest)
! 	    {
! 	      set_def_block (*dest, bb);
! 	      SET_BIT (kills, var_ann (*dest)->uid);
  	    }
+ 
  	}
      }
  
!   free (kills);
  }
  
  
*************** set_def_block (var, bb)
*** 386,391 ****
--- 445,451 ----
        db_p = xmalloc (sizeof (*db_p));
        db_p->var = var;
        db_p->def_blocks = BITMAP_XMALLOC ();
+       db_p->livein_blocks = BITMAP_XMALLOC ();
        *slot = (void *) db_p;
      }
    else
*************** set_def_block (var, bb)
*** 395,400 ****
--- 455,488 ----
    bitmap_set_bit (db_p->def_blocks, bb->index);
  }
  
+ /* Mark block BB as having VAR live at the entry to BB.  */
+ 
+ static void
+ set_livein_block (var, bb)
+      tree var;
+      basic_block bb;
+ {
+   struct def_blocks_d db, *db_p;
+   void **slot;
+ 
+   /* Find the DEFS bitmap associated with variable VAR.  */
+   db.var = var;
+   slot = htab_find_slot (def_blocks, (void *) &db, INSERT);
+   if (*slot == NULL)
+     {
+       db_p = xmalloc (sizeof (*db_p));
+       db_p->var = var;
+       db_p->def_blocks = BITMAP_XMALLOC ();
+       db_p->livein_blocks = BITMAP_XMALLOC ();
+       *slot = (void *) db_p;
+     }
+   else
+     db_p = (struct def_blocks_d *) *slot;
+ 
+   /* Set the bit corresponding to the block where VAR is defined.  */
+   bitmap_set_bit (db_p->livein_blocks, bb->index);
+ }
+ 
  
  /* Insert PHI nodes at the dominance frontier of blocks with variable
     definitions.  DFS contains the dominance frontier information for the
*************** set_def_block (var, bb)
*** 405,428 ****
     need PHI nodes.  */
  
  static void
! insert_phi_nodes (dfs, nonlocal_vars)
!      sbitmap *dfs;
!      sbitmap nonlocal_vars;
  {
    size_t i;
  
    timevar_push (TV_TREE_INSERT_PHI_NODES);
  
-   /* Array ADDED (indexed by basic block number) is used to determine
-      whether a PHI node for the current variable has already been
-      inserted at block X.  */
-   VARRAY_TREE_INIT (added, last_basic_block, "added");
- 
-   /* Array IN_WORK (indexed by basic block number) is used to determine
-      whether block X has already been added to WORK_STACK for the current
-      variable.  */
-   VARRAY_TREE_INIT (in_work, last_basic_block, "in_work");
- 
    /* Array WORK_STACK is a stack of CFG blocks.  Each block that contains
       an assignment or PHI node will be pushed to this stack.  */
    VARRAY_BB_INIT (work_stack, last_basic_block, "work_stack");
--- 493,506 ----
     need PHI nodes.  */
  
  static void
! insert_phi_nodes (dfs, globals)
!      bitmap *dfs;
!      sbitmap globals;
  {
    size_t i;
  
    timevar_push (TV_TREE_INSERT_PHI_NODES);
  
    /* Array WORK_STACK is a stack of CFG blocks.  Each block that contains
       an assignment or PHI node will be pushed to this stack.  */
    VARRAY_BB_INIT (work_stack, last_basic_block, "work_stack");
*************** insert_phi_nodes (dfs, nonlocal_vars)
*** 432,444 ****
       for the variable.  PHI nodes will be added to the dominance frontier
       blocks of each definition block.  */
    for (i = 0; i < num_referenced_vars; i++)
!     if (TEST_BIT (nonlocal_vars, i))
        insert_phi_nodes_for (referenced_var (i), dfs);
  
-   added = NULL;
-   in_work = NULL;
    work_stack = NULL;
- 
    timevar_pop (TV_TREE_INSERT_PHI_NODES);
  }
  
--- 510,519 ----
       for the variable.  PHI nodes will be added to the dominance frontier
       blocks of each definition block.  */
    for (i = 0; i < num_referenced_vars; i++)
!     if (TEST_BIT (globals, i))
        insert_phi_nodes_for (referenced_var (i), dfs);
  
    work_stack = NULL;
    timevar_pop (TV_TREE_INSERT_PHI_NODES);
  }
  
*************** debug_tree_ssa ()
*** 656,723 ****
  static void
  insert_phi_nodes_for (var, dfs)
       tree var;
!      sbitmap *dfs;
  {
-   unsigned long i;
    struct def_blocks_d dm, *def_map;
  
-   /* Add to the worklist all the blocks that have definitions of VAR.  */
    dm.var = var;
    def_map = (struct def_blocks_d *) htab_find (def_blocks, (void *) &dm);
    if (def_map == NULL)
      return;
  
!   EXECUTE_IF_SET_IN_BITMAP (def_map->def_blocks, 0, i,
!       {
! 	basic_block bb = BASIC_BLOCK (i);
! 	VARRAY_PUSH_BB (work_stack, bb);
! 	VARRAY_TREE (in_work, bb->index) = var;
!       });
  
!   /* Insert PHI nodes at the dominance frontier of all the basic blocks
!      in the worklist.  */
    while (VARRAY_ACTIVE_SIZE (work_stack) > 0)
      {
!       size_t w;
!       basic_block bb;
  
-       bb = VARRAY_TOP_BB (work_stack);
        VARRAY_POP (work_stack);
  
!       EXECUTE_IF_SET_IN_SBITMAP (dfs[bb->index], 0, w,
! 				 add_phi_node (BASIC_BLOCK (w), var));
!     }
! }
! 
! 
! /* Add a new PHI node for variable VAR at the start of basic block BB.
  
!    If BB didn't have a definition of VAR, we add BB itself to the worklist
!    because the PHI node introduces a new definition of VAR.  */
  
! static void
! add_phi_node (bb, var)
!      basic_block bb;
!      tree var;
! {
!   if (VARRAY_TREE (added, bb->index) != var)
      {
!       tree phi;
  
!       phi = create_phi_node (var, bb);
!       VARRAY_TREE (added, bb->index) = var;
  
!       /* Basic block BB now has a new definition of VAR.  If BB wasn't in
! 	 the worklist already, add it.  */
!       if (VARRAY_TREE (in_work, bb->index) != var)
! 	{
! 	  VARRAY_PUSH_BB (work_stack, bb);
! 	  VARRAY_TREE (in_work, bb->index) = var;
! 	}
!     }
  }
  
- 
  /* Rewrite all the statements in basic block BB.
     
     BLOCK_DEFS_P points to a stack with all the definitions found in the
--- 731,825 ----
  static void
  insert_phi_nodes_for (var, dfs)
       tree var;
!      bitmap *dfs;
  {
    struct def_blocks_d dm, *def_map;
+   bitmap phi_insertion_points;
+   unsigned phi_vector_lengths = 0;
+   int use_fully_pruned_ssa = 0;
+   int bb_index;
  
    dm.var = var;
    def_map = (struct def_blocks_d *) htab_find (def_blocks, (void *) &dm);
    if (def_map == NULL)
      return;
  
!   phi_insertion_points = BITMAP_XMALLOC ();
  
!   EXECUTE_IF_SET_IN_BITMAP (def_map->def_blocks, 0, bb_index,
!     {
!       VARRAY_PUSH_BB (work_stack, BASIC_BLOCK (bb_index));
!     });
! 
!   /* Pop a block off the worklist, add every block that appears in
!      the original block's dfs that we have not already processed to
!      the worklist.  Iterate until the worklist is empty.   Blocks
!      which are added to the worklist are potential sites for
!      PHI nodes. 
! 
!      The iteration step could be done during PHI insertion.  But
!      finding all the PHI insertion blocks first allows us to use
!      that list of blocks in our heuristics to determine if we should
!      use semi-pruned for fully-pruned SSA forms. 
! 
!      While we're iterating we also compute the total length of all the
!      PHI node vectors for this variable.  We use this in our 
!      heuristic.  */
    while (VARRAY_ACTIVE_SIZE (work_stack) > 0)
      {
!       edge e;
!       basic_block bb = VARRAY_TOP_BB (work_stack);
!       int bb_index = bb->index;
  
        VARRAY_POP (work_stack);
+       
  
!       EXECUTE_IF_SET_IN_BITMAP (dfs[bb_index], 0, bb_index,
! 	{
!           basic_block bb = BASIC_BLOCK (bb_index);
  
! 	  if (! bitmap_bit_p (phi_insertion_points, bb_index))
! 	    {
! 	      for (e = bb->pred; e; e = e->pred_next)
! 		phi_vector_lengths++;
! 	      VARRAY_PUSH_BB (work_stack, bb);
! 	      bitmap_set_bit (phi_insertion_points, bb_index);
! 	    }
! 	});
!     }
  
!   /* Now that we know the number of elements in all the potential
!      PHI nodes for this variable, we can determine if it is 
!      worth computing the fully pruned SSA form.  The larger the
!      total number of elements, the more important it is to use
!      the fully pruned form.
!  
!      Experimentation showed that once we get more than 8 phi vector
!      entries that moving to a fully-pruned implementation is comparable
!      to semi-pruned.  32 showed up as the threshhold which maximized
!      overall compile-time performance. 
! 
!      Note that as this number gets larger, the potential for the
!      compiler to run wild and eat all available memory increases. 
!      So if you decide to change it, do so with care.  Consider
!      compile/20001226-1 with all memory references disambiguated
!      as the testcase for the compiler running wild eating memory.  */
!   if (phi_vector_lengths > 32)
      {
!       use_fully_pruned_ssa = 1;
!       compute_global_livein (def_map->livein_blocks, def_map->def_blocks);
!     }
  
!   EXECUTE_IF_SET_IN_BITMAP (phi_insertion_points, 0, bb_index,
!     {
!       if (! use_fully_pruned_ssa
! 	  || bitmap_bit_p (def_map->livein_blocks, bb_index))
!         create_phi_node (var, BASIC_BLOCK (bb_index));
!     });
  
!   BITMAP_XFREE (phi_insertion_points);
  }
  
  /* Rewrite all the statements in basic block BB.
     
     BLOCK_DEFS_P points to a stack with all the definitions found in the
*************** def_blocks_free (p)
*** 994,1000 ****
       void *p;
  {
    struct def_blocks_d *db_p = (struct def_blocks_d *) p;
!   free (db_p->def_blocks);
    free (p);
  }
  
--- 1096,1103 ----
       void *p;
  {
    struct def_blocks_d *db_p = (struct def_blocks_d *) p;
!   BITMAP_XFREE (db_p->def_blocks);
!   BITMAP_XFREE (db_p->livein_blocks);
    free (p);
  }
  
*************** debug_def_blocks ()
*** 1045,1051 ****
    htab_traverse (def_blocks, debug_def_blocks_r, NULL);
  }
  
- 
  /* Callback for htab_traverse to dump the DEF_BLOCKS hash table.  */
  
  static int
--- 1148,1153 ----
*************** debug_def_blocks_r (slot, data)
*** 1060,1065 ****
--- 1162,1171 ----
    print_generic_expr (stderr, db_p->var, 0);
    fprintf (stderr, ", DEF_BLOCKS: { ");
    EXECUTE_IF_SET_IN_BITMAP (db_p->def_blocks, 0, i,
+ 			    fprintf (stderr, "%ld ", i));
+   fprintf (stderr, "}\n");
+   fprintf (stderr, ", LIVEIN_BLOCKS: { ");
+   EXECUTE_IF_SET_IN_BITMAP (db_p->livein_blocks, 0, i,
  			    fprintf (stderr, "%ld ", i));
    fprintf (stderr, "}\n");
  





Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]