Hi,
this patch speeds up compilation of compile/20001226-1.c from:
tree VRP : 35.24 (77%) usr 0.17 (21%) sys 35.50 (75%) wall 14779 kB (17%) ggc
TOTAL : 45.85 0.81 47.07 85808 kB
to:
tree VRP : 20.57 (66%) usr 0.08 ( 8%) sys 20.65 (64%) wall 14779 kB (17%) ggc
TOTAL : 31.17 0.98 32.23 85808 kB
Zdenek's patch testing script measured on GCC modules:
Compilation speed before:
real 4m45.182s
user 4m40.327s
sys 0m2.929s
real 4m45.776s
user 4m42.911s
sys 0m2.765s
Compilation speed after:
real 4m38.457s
user 4m35.700s
sys 0m2.688s
real 4m38.393s
user 4m35.701s
sys 0m2.638s
The tester however has pretty high noise, but I think this is off the
noise.
The actual problem is that our propagation engine is causing O(n^3) updates
(since incredibly large PHI node is walked many times, each time any of it's
argument change) and VRP's interval merging code is incredibly slow.
This is because it is written in the form of fold_binary_to_constant that is
very hard working function and in general way it first converts each query
to LT_EXPR and then execute it.
I've sped it up by ding this conversion by hand and inlning the most common case
of integer constants.
I also sent patches to deal with the PHI node walking problem, but for stage1
we probably should think about fixing this correctly and making propagation
engine to work over PHI operands, instead of whole PHIs and update our interface
accordingly.
Also value_ranges_intersect_p is a hot function and written in incredibly
ineffecient way. It basically needs to sort the 4 numbers, where part of
information is known from the fact if the intervals are ANTI_* or not.
THis is done by 8 comparsions, while 3 should be sufficient at average I believe,
but I didn't get correctlu through all the details handling ANTI intervals
correctly.
Bootstrapped/regtested i686-linux, OK?
:ADDPATCH middle-end:
Honza
* tree-vrp.c (fast_lt_compare): New.
(compare_values, value_inside_range): Rewrite in terms of
fast_lt_compare.
Index: tree-vrp.c
===================================================================
*** tree-vrp.c (revision 119454)
--- tree-vrp.c (working copy)
*************** valid_value_p (tree expr)
*** 442,447 ****
--- 442,470 ----
return is_gimple_min_invariant (expr);
}
+ /* Return
+ 1 if VAL < VAL2
+ 0 if !(VAL < VAL2)
+ -2 if those are incomparable. */
+ static inline int
+ fast_lt_compare (tree val, tree val2)
+ {
+ tree tcmp;
+ /* LT is folded faster than GE and others. Inline the common case. */
+ if (TREE_CODE (val) == INTEGER_CST && TREE_CODE (val2) == INTEGER_CST)
+ {
+ if (TYPE_UNSIGNED (TREE_TYPE (val)))
+ return INT_CST_LT_UNSIGNED (val, val2);
+ else
+ return INT_CST_LT (val, val2);
+ }
+ else
+ tcmp = fold_binary_to_constant (LT_EXPR, boolean_type_node, val, val2);
+ if (!tcmp)
+ return -1;
+ return tcmp == boolean_true_node;
+ }
+
/* Compare two values VAL1 and VAL2. Return
-2 if VAL1 and VAL2 cannot be compared at compile-time,
*************** compare_values (tree val1, tree val2)
*** 590,602 ****
return 0;
/* If VAL1 is a lower address than VAL2, return -1. */
! t = fold_binary (LT_EXPR, boolean_type_node, val1, val2);
! if (t == boolean_true_node)
return -1;
/* If VAL1 is a higher address than VAL2, return +1. */
! t = fold_binary (GT_EXPR, boolean_type_node, val1, val2);
! if (t == boolean_true_node)
return 1;
/* If VAL1 is different than VAL2, return +2. */
--- 613,623 ----
return 0;
/* If VAL1 is a lower address than VAL2, return -1. */
! if (fast_lt_compare (val1, val2) == 1)
return -1;
/* If VAL1 is a higher address than VAL2, return +1. */
! if (fast_lt_compare (val2, val1) == 1)
return 1;
/* If VAL1 is different than VAL2, return +2. */
*************** compare_values (tree val1, tree val2)
*** 608,614 ****
}
}
-
/* Return 1 if VAL is inside value range VR (VR->MIN <= VAL <= VR->MAX),
0 if VAL is not inside VR,
-2 if we cannot tell either way.
--- 629,634 ----
*************** compare_values (tree val1, tree val2)
*** 627,653 ****
This also applies to value_ranges_intersect_p and
range_includes_zero_p. The semantics of VR_RANGE and
VR_ANTI_RANGE should be encoded here, but that also means
! adapting the users of these functions to the new semantics. */
static inline int
! value_inside_range (tree val, value_range_t *vr)
{
! tree cmp1, cmp2;
! cmp1 = fold_binary_to_constant (GE_EXPR, boolean_type_node, val, vr->min);
! if (!cmp1)
return -2;
! cmp2 = fold_binary_to_constant (LE_EXPR, boolean_type_node, val, vr->max);
! if (!cmp2)
return -2;
! return cmp1 == boolean_true_node && cmp2 == boolean_true_node;
}
/* Return true if value ranges VR0 and VR1 have a non-empty
! intersection. */
static inline bool
value_ranges_intersect_p (value_range_t *vr0, value_range_t *vr1)
--- 647,682 ----
This also applies to value_ranges_intersect_p and
range_includes_zero_p. The semantics of VR_RANGE and
VR_ANTI_RANGE should be encoded here, but that also means
! adapting the users of these functions to the new semantics.
!
! Benchmark compile/20001226-1.c compilation time after changing this
! function. */
static inline int
! value_inside_range (tree val, value_range_t * vr)
{
! int cmp1, cmp2;
! cmp1 = fast_lt_compare (val, vr->min);
! if (cmp1 == -2)
return -2;
+ if (cmp1)
+ return 0;