This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] Catch 11.8% more jump threading opportunities at tree level.


Hi,

Attached is an experimental patch to catch 11.8% more jump threading
opportunities at tree level.

Consider:

<L0>:;
  if (b_4 != 2) goto <L1>; else goto <L2>;

<L1>:;
  D.1120_7 = 10;
  goto <bb 6> (<L6>);

<L2>:;
  if (b_4 == 1) goto <L4>; else goto <L5>;

Note that edge from <L0> to <L2> can be threaded through <L2> to <L5>.
However, currently, DOM does not record "b_4 == 2" when threading edge
from <L0> to <L2> due to what seems to be an artificial restriction in
dom_opt_finalize_block and fails to thread the edge.  The patch below
removes this restriction and allows DOM to take the above jump
threading opportunity.

The story does not end here.  Once we remove the restriction, we know
"too much" about variables and end up missing some jump threading
opportunities.  Consider:

<bb 0>
  D.1120_2 = *p_1;
  if (D.1120_2 != 0) goto <L0>; else goto <L1>;

<L0>:;
  bar ();

<L1>:;
  D.1120_3 = *p_1;
  if (D.1120_3 != 0) goto <L2>; else goto <L3>;

When we consider threading edge from <bb 0> to <L1>, we know that
D.1120_2 == *p_1 and that D.1120_2 == 0.  As soon as
thread_across_edge sees "D.1120_3 = *p_1;", it is simplified to
"D.1120_3 = 0;".  The rhs is no longer an SSA_NAME.  Since
thread_across_edge requires that every statement leading up to
COND_EXPR or SWITCH_EXPR must be a nop like a copy betwen two
SSA_NAMEs with the same underlying variable, we drop this jump
threading opportunity.

To work around this problem, I created a variant of lookup_avail_expr
without SSA_NAME_VALUE chasing.  This new function allows
thread_across_edge to know that "D.1120_2 == D.1120_3;", making it
possible to take the jump threading opportunity.

This small hack seems to bring a substantial improvement.

Here are the numbers of jump threading opportunities picked up at
various points while compiling cc1-i files.

      original patched    diff%
-------------------------------
tree     19431   21729 +11.826% <- wow!
RTL       2962    2224 -24.915%
-------------------------------
total    22393   23953  +6.966%

Jeff seems to have his own plan to improve the jump threading
selection code, but I just wanted to let him and everybody else know
that there are a lot of low-hanging jump threading opportunities
waiting there regardless of whether or not we want to go with my
approach.  Note that we take a lot of work away from the RTL-level
jump bypass code, which is a good thing.  I never feel comfortable
when RTL optimizers are deciphering conditonal jumps that are
expressed in so many different ways from architecture to architecture.

Bootstrapped on i686-pc-linux-gnu.  Any comments?

p.s.
Let's not worry about lack of comment, code duplication, etc, for
now. :-)

Kazu Hirata

2005-02-08  Kazu Hirata  <kazu@cs.umass.edu>

	PR tree-optimization/19516, PR tree-optimization/19804, 
	* tree-ssa-dom.c (thread_across_edge): If cached_lhs is
	useless, call lookup_avail_expr_1 to see if we can detect a
	nop.
	(lookup_avail_expr_1): New.

Index: tree-ssa-dom.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/tree-ssa-dom.c,v
retrieving revision 2.90
diff -c -d -p -r2.90 tree-ssa-dom.c
*** tree-ssa-dom.c	1 Feb 2005 03:52:37 -0000	2.90
--- tree-ssa-dom.c	8 Feb 2005 18:49:13 -0000
*************** static void optimize_stmt (struct dom_wa
*** 264,269 ****
--- 264,270 ----
  			   basic_block bb,
  			   block_stmt_iterator);
  static tree lookup_avail_expr (tree, bool);
+ static tree lookup_avail_expr_1 (tree, bool);
  static hashval_t vrp_hash (const void *);
  static int vrp_eq (const void *, const void *);
  static hashval_t avail_expr_hash (const void *);
*************** thread_across_edge (struct dom_walk_data
*** 564,570 ****
  
    for (bsi = bsi_start (e->dest); ! bsi_end_p (bsi); bsi_next (&bsi))
      {
!       tree lhs, cached_lhs;
  
        stmt = bsi_stmt (bsi);
  
--- 565,571 ----
  
    for (bsi = bsi_start (e->dest); ! bsi_end_p (bsi); bsi_next (&bsi))
      {
!       tree lhs, cached_lhs, cached_lhs_1 = 0;
  
        stmt = bsi_stmt (bsi);
  
*************** thread_across_edge (struct dom_walk_data
*** 634,639 ****
--- 635,641 ----
  
  	  /* Try to lookup the new expression.  */
  	  cached_lhs = lookup_avail_expr (stmt, false);
+ 	  cached_lhs_1 = lookup_avail_expr_1 (stmt, false);
  
  	  /* Restore the statement's original uses/defs.  */
  	  for (i = 0; i < NUM_USES (uses); i++)
*************** thread_across_edge (struct dom_walk_data
*** 654,660 ****
        /* If the expression in the hash table was not assigned to an
  	 SSA_NAME, then we can not ignore this statement.  */
        if (TREE_CODE (cached_lhs) != SSA_NAME)
! 	break;
  
        /* If we have different underlying variables, then we can not
  	 ignore this statement.  */
--- 656,668 ----
        /* If the expression in the hash table was not assigned to an
  	 SSA_NAME, then we can not ignore this statement.  */
        if (TREE_CODE (cached_lhs) != SSA_NAME)
! 	{
! 	  if (cached_lhs_1
! 	      && TREE_CODE (cached_lhs_1) == SSA_NAME)
! 	    cached_lhs = cached_lhs_1;
! 	  else
! 	    break;
! 	}
  
        /* If we have different underlying variables, then we can not
  	 ignore this statement.  */
*************** dom_opt_finalize_block (struct dom_walk_
*** 1059,1065 ****
  		 the same underlying variable to avoid missing threading
  		 opportunities.  */
  	      if (lhs
! 		  && TREE_CODE (COND_EXPR_COND (last)) == SSA_NAME)
  		record_const_or_copy (lhs, rhs);
  
  	      /* If we have 0 = COND or 1 = COND equivalences, record them
--- 1067,1073 ----
  		 the same underlying variable to avoid missing threading
  		 opportunities.  */
  	      if (lhs
! 		  && TREE_CODE (lhs) == SSA_NAME)
  		record_const_or_copy (lhs, rhs);
  
  	      /* If we have 0 = COND or 1 = COND equivalences, record them
*************** lookup_avail_expr (tree stmt, bool inser
*** 3177,3182 ****
--- 3185,3257 ----
    return lhs;
  }
  
+ static tree
+ lookup_avail_expr_1 (tree stmt, bool insert)
+ {
+   void **slot;
+   tree lhs;
+   struct expr_hash_elt *element = xcalloc (sizeof (struct expr_hash_elt), 1);
+ 
+   lhs = TREE_CODE (stmt) == MODIFY_EXPR ? TREE_OPERAND (stmt, 0) : NULL;
+ 
+   initialize_hash_element (stmt, lhs, element);
+ 
+   /* Don't bother remembering constant assignments and copy operations.
+      Constants and copy operations are handled by the constant/copy propagator
+      in optimize_stmt.  */
+   if (TREE_CODE (element->rhs) == SSA_NAME
+       || is_gimple_min_invariant (element->rhs))
+     {
+       free (element);
+       return NULL_TREE;
+     }
+ 
+   /* If this is an equality test against zero, see if we have recorded a
+      nonzero value for the variable in question.  */
+   if ((TREE_CODE (element->rhs) == EQ_EXPR
+        || TREE_CODE  (element->rhs) == NE_EXPR)
+       && TREE_CODE (TREE_OPERAND (element->rhs, 0)) == SSA_NAME
+       && integer_zerop (TREE_OPERAND (element->rhs, 1)))
+     {
+       int indx = SSA_NAME_VERSION (TREE_OPERAND (element->rhs, 0));
+ 
+       if (bitmap_bit_p (nonzero_vars, indx))
+ 	{
+ 	  tree t = element->rhs;
+ 	  free (element);
+ 
+ 	  if (TREE_CODE (t) == EQ_EXPR)
+ 	    return boolean_false_node;
+ 	  else
+ 	    return boolean_true_node;
+ 	}
+     }
+ 
+   /* Finally try to find the expression in the main expression hash table.  */
+   slot = htab_find_slot_with_hash (avail_exprs, element, element->hash,
+ 				   (insert ? INSERT : NO_INSERT));
+   if (slot == NULL)
+     {
+       free (element);
+       return NULL_TREE;
+     }
+ 
+   if (*slot == NULL)
+     {
+       *slot = (void *) element;
+       VEC_safe_push (tree_on_heap, avail_exprs_stack,
+ 		     stmt ? stmt : element->rhs);
+       return NULL_TREE;
+     }
+ 
+   /* Extract the LHS of the assignment so that it can be used as the current
+      definition of another variable.  */
+   lhs = ((struct expr_hash_elt *)*slot)->lhs;
+ 
+   free (element);
+   return lhs;
+ }
+ 
  /* Given a condition COND, record into HI_P, LO_P and INVERTED_P the
     range of values that result in the conditional having a true value.
  


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]