VRP speedup

Jan Hubicka jh@suse.cz
Sun Dec 3 14:21:00 GMT 2006


Hi,
this patch speeds up compilation of compile/20001226-1.c from:
 tree VRP              :  35.24 (77%) usr   0.17 (21%) sys  35.50 (75%) wall   14779 kB (17%) ggc
 TOTAL                 :  45.85             0.81            47.07              85808 kB
to:
 tree VRP              :  20.57 (66%) usr   0.08 ( 8%) sys  20.65 (64%) wall   14779 kB (17%) ggc
 TOTAL                 :  31.17             0.98            32.23              85808 kB

Zdenek's patch testing script measured on GCC modules:
Compilation speed before:

real	4m45.182s
user	4m40.327s
sys	0m2.929s

real	4m45.776s
user	4m42.911s
sys	0m2.765s

Compilation speed after:

real	4m38.457s
user	4m35.700s
sys	0m2.688s

real	4m38.393s
user	4m35.701s
sys	0m2.638s

The tester however has pretty high noise, but I think this is off the
noise.

The actual problem is that our propagation engine is causing O(n^3) updates
(since incredibly large PHI node is walked many times, each time any of it's
argument change) and VRP's interval merging code is incredibly slow.

This is because it is written in the form of fold_binary_to_constant that is
very hard working function and in general way it first converts each query
to LT_EXPR and then execute it.

I've sped it up by ding this conversion by hand and inlning the most common case
of integer constants.

I also sent patches to deal with the PHI node walking problem, but for stage1
we probably should think about fixing this correctly and making propagation
engine to work over PHI operands, instead of whole PHIs and update our interface
accordingly.

Also value_ranges_intersect_p is a hot function and written in incredibly
ineffecient way. It basically needs to sort the 4 numbers, where part of
information is known from the fact if the intervals are ANTI_* or not.
THis is done by 8 comparsions, while 3 should be sufficient at average I believe,
but I didn't get correctlu through all the details handling ANTI intervals
correctly.

Bootstrapped/regtested i686-linux, OK?
:ADDPATCH middle-end:
Honza

	* tree-vrp.c (fast_lt_compare): New.
	(compare_values, value_inside_range): Rewrite in terms of
	fast_lt_compare.

Index: tree-vrp.c
===================================================================
*** tree-vrp.c	(revision 119454)
--- tree-vrp.c	(working copy)
*************** valid_value_p (tree expr)
*** 442,447 ****
--- 442,470 ----
    return is_gimple_min_invariant (expr);
  }
  
+ /* Return 
+    1 if VAL < VAL2
+    0 if !(VAL < VAL2)
+    -2 if those are incomparable.  */
+ static inline int
+ fast_lt_compare (tree val, tree val2)
+ {
+   tree tcmp;
+   /* LT is folded faster than GE and others.  Inline the common case.  */
+   if (TREE_CODE (val) == INTEGER_CST && TREE_CODE (val2) == INTEGER_CST)
+     {
+       if (TYPE_UNSIGNED (TREE_TYPE (val)))
+ 	return INT_CST_LT_UNSIGNED (val, val2);
+       else
+ 	return INT_CST_LT (val, val2);
+     }
+   else
+     tcmp = fold_binary_to_constant (LT_EXPR, boolean_type_node, val, val2);
+   if (!tcmp)
+     return -1;
+   return tcmp == boolean_true_node;
+ }
+ 
  /* Compare two values VAL1 and VAL2.  Return
     
     	-2 if VAL1 and VAL2 cannot be compared at compile-time,
*************** compare_values (tree val1, tree val2)
*** 590,602 ****
  	return 0;
        
        /* If VAL1 is a lower address than VAL2, return -1.  */
!       t = fold_binary (LT_EXPR, boolean_type_node, val1, val2);
!       if (t == boolean_true_node)
  	return -1;
  
        /* If VAL1 is a higher address than VAL2, return +1.  */
!       t = fold_binary (GT_EXPR, boolean_type_node, val1, val2);
!       if (t == boolean_true_node)
  	return 1;
  
        /* If VAL1 is different than VAL2, return +2.  */
--- 613,623 ----
  	return 0;
        
        /* If VAL1 is a lower address than VAL2, return -1.  */
!       if (fast_lt_compare (val1, val2) == 1)
  	return -1;
  
        /* If VAL1 is a higher address than VAL2, return +1.  */
!       if (fast_lt_compare (val2, val1) == 1)
  	return 1;
  
        /* If VAL1 is different than VAL2, return +2.  */
*************** compare_values (tree val1, tree val2)
*** 608,614 ****
      }
  }
  
- 
  /* Return 1 if VAL is inside value range VR (VR->MIN <= VAL <= VR->MAX),
            0 if VAL is not inside VR,
  	 -2 if we cannot tell either way.
--- 629,634 ----
*************** compare_values (tree val1, tree val2)
*** 627,653 ****
  	  This also applies to value_ranges_intersect_p and
  	  range_includes_zero_p.  The semantics of VR_RANGE and
  	  VR_ANTI_RANGE should be encoded here, but that also means
! 	  adapting the users of these functions to the new semantics.  */
  
  static inline int
! value_inside_range (tree val, value_range_t *vr)
  {
!   tree cmp1, cmp2;
  
!   cmp1 = fold_binary_to_constant (GE_EXPR, boolean_type_node, val, vr->min);
!   if (!cmp1)
      return -2;
  
!   cmp2 = fold_binary_to_constant (LE_EXPR, boolean_type_node, val, vr->max);
!   if (!cmp2)
      return -2;
  
!   return cmp1 == boolean_true_node && cmp2 == boolean_true_node;
  }
  
  
  /* Return true if value ranges VR0 and VR1 have a non-empty
!    intersection.  */
  
  static inline bool
  value_ranges_intersect_p (value_range_t *vr0, value_range_t *vr1)
--- 647,682 ----
  	  This also applies to value_ranges_intersect_p and
  	  range_includes_zero_p.  The semantics of VR_RANGE and
  	  VR_ANTI_RANGE should be encoded here, but that also means
! 	  adapting the users of these functions to the new semantics.  
! 
!    Benchmark compile/20001226-1.c compilation time after changing this
!    function.  */
  
  static inline int
! value_inside_range (tree val, value_range_t * vr)
  {
!   int cmp1, cmp2;
  
!   cmp1 = fast_lt_compare (val, vr->min);
!   if (cmp1 == -2)
      return -2;
+   if (cmp1)
+     return 0;
  
!   cmp2 = fast_lt_compare (vr->max, val);
!   if (cmp2 == -2)
      return -2;
  
!   return !cmp2;
  }
  
  
  /* Return true if value ranges VR0 and VR1 have a non-empty
!    intersection.  
!    
!    Benchmark compile/20001226-1.c compilation time after changing this
!    function.
!    */
  
  static inline bool
  value_ranges_intersect_p (value_range_t *vr0, value_range_t *vr1)



More information about the Gcc-patches mailing list