This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: VRP speedup


On 12/3/06, Jan Hubicka <jh@suse.cz> wrote:
Hi,
this patch speeds up compilation of compile/20001226-1.c from:
 tree VRP              :  35.24 (77%) usr   0.17 (21%) sys  35.50 (75%) wall   14779 kB (17%) ggc
 TOTAL                 :  45.85             0.81            47.07              85808 kB
to:
 tree VRP              :  20.57 (66%) usr   0.08 ( 8%) sys  20.65 (64%) wall   14779 kB (17%) ggc
 TOTAL                 :  31.17             0.98            32.23              85808 kB

Zdenek's patch testing script measured on GCC modules:
Compilation speed before:

real    4m45.182s
user    4m40.327s
sys     0m2.929s

real    4m45.776s
user    4m42.911s
sys     0m2.765s

Compilation speed after:

real    4m38.457s
user    4m35.700s
sys     0m2.688s

real    4m38.393s
user    4m35.701s
sys     0m2.638s

The tester however has pretty high noise, but I think this is off the
noise.

The actual problem is that our propagation engine is causing O(n^3) updates
(since incredibly large PHI node is walked many times, each time any of it's
argument change) and VRP's interval merging code is incredibly slow.

This is because it is written in the form of fold_binary_to_constant that is
very hard working function and in general way it first converts each query
to LT_EXPR and then execute it.

I've sped it up by ding this conversion by hand and inlning the most common case
of integer constants.

I also sent patches to deal with the PHI node walking problem, but for stage1
we probably should think about fixing this correctly and making propagation
engine to work over PHI operands, instead of whole PHIs and update our interface
accordingly.

Also value_ranges_intersect_p is a hot function and written in incredibly
ineffecient way. It basically needs to sort the 4 numbers, where part of
information is known from the fact if the intervals are ANTI_* or not.
THis is done by 8 comparsions, while 3 should be sufficient at average I believe,
but I didn't get correctlu through all the details handling ANTI intervals
correctly.

Bootstrapped/regtested i686-linux, OK?
:ADDPATCH middle-end:
Honza

        * tree-vrp.c (fast_lt_compare): New.
        (compare_values, value_inside_range): Rewrite in terms of
        fast_lt_compare.

Index: tree-vrp.c
===================================================================
*** tree-vrp.c  (revision 119454)
--- tree-vrp.c  (working copy)
*************** valid_value_p (tree expr)
*** 442,447 ****
--- 442,470 ----
    return is_gimple_min_invariant (expr);
  }

+ /* Return
+    1 if VAL < VAL2
+    0 if !(VAL < VAL2)
+    -2 if those are incomparable.  */
+ static inline int
+ fast_lt_compare (tree val, tree val2)
+ {
+   tree tcmp;
+   /* LT is folded faster than GE and others.  Inline the common case.  */
+   if (TREE_CODE (val) == INTEGER_CST && TREE_CODE (val2) == INTEGER_CST)
+     {
+       if (TYPE_UNSIGNED (TREE_TYPE (val)))
+       return INT_CST_LT_UNSIGNED (val, val2);
+       else
+       return INT_CST_LT (val, val2);
+     }
+   else
+     tcmp = fold_binary_to_constant (LT_EXPR, boolean_type_node, val, val2);
+   if (!tcmp)
+     return -1;

This should be return -2 (at least you check for that below and the comment says so). The patch also looks more like a hack to me ;) But I guess until someone fixes fold-const.c to provide a special fold-to-constant for comparisons it's still ok.

+   return tcmp == boolean_true_node;
+ }
+
  /* Compare two values VAL1 and VAL2.  Return

        -2 if VAL1 and VAL2 cannot be compared at compile-time,
*************** compare_values (tree val1, tree val2)
*** 590,602 ****
        return 0;

        /* If VAL1 is a lower address than VAL2, return -1.  */
!       t = fold_binary (LT_EXPR, boolean_type_node, val1, val2);
!       if (t == boolean_true_node)
        return -1;

        /* If VAL1 is a higher address than VAL2, return +1.  */
!       t = fold_binary (GT_EXPR, boolean_type_node, val1, val2);
!       if (t == boolean_true_node)
        return 1;

        /* If VAL1 is different than VAL2, return +2.  */
--- 613,623 ----
        return 0;

        /* If VAL1 is a lower address than VAL2, return -1.  */
!       if (fast_lt_compare (val1, val2) == 1)
        return -1;

        /* If VAL1 is a higher address than VAL2, return +1.  */
!       if (fast_lt_compare (val2, val1) == 1)
        return 1;

        /* If VAL1 is different than VAL2, return +2.  */
*************** compare_values (tree val1, tree val2)
*** 608,614 ****
      }
  }

-
  /* Return 1 if VAL is inside value range VR (VR->MIN <= VAL <= VR->MAX),
            0 if VAL is not inside VR,
         -2 if we cannot tell either way.
--- 629,634 ----
*************** compare_values (tree val1, tree val2)
*** 627,653 ****
          This also applies to value_ranges_intersect_p and
          range_includes_zero_p.  The semantics of VR_RANGE and
          VR_ANTI_RANGE should be encoded here, but that also means
!         adapting the users of these functions to the new semantics.  */

  static inline int
! value_inside_range (tree val, value_range_t *vr)
  {
!   tree cmp1, cmp2;

!   cmp1 = fold_binary_to_constant (GE_EXPR, boolean_type_node, val, vr->min);
!   if (!cmp1)
      return -2;

!   cmp2 = fold_binary_to_constant (LE_EXPR, boolean_type_node, val, vr->max);
!   if (!cmp2)
      return -2;

!   return cmp1 == boolean_true_node && cmp2 == boolean_true_node;
  }


/* Return true if value ranges VR0 and VR1 have a non-empty ! intersection. */

  static inline bool
  value_ranges_intersect_p (value_range_t *vr0, value_range_t *vr1)
--- 647,682 ----
          This also applies to value_ranges_intersect_p and
          range_includes_zero_p.  The semantics of VR_RANGE and
          VR_ANTI_RANGE should be encoded here, but that also means
!         adapting the users of these functions to the new semantics.
!
!    Benchmark compile/20001226-1.c compilation time after changing this
!    function.  */

  static inline int
! value_inside_range (tree val, value_range_t * vr)
  {
!   int cmp1, cmp2;

!   cmp1 = fast_lt_compare (val, vr->min);
!   if (cmp1 == -2)
      return -2;
+   if (cmp1)
+     return 0;

Please use cmp1 == 1 here.


!   cmp2 = fast_lt_compare (vr->max, val);
!   if (cmp2 == -2)
      return -2;

! return !cmp2;

Likewise. A comment on how this models inside-range would also be nice.


}


/* Return true if value ranges VR0 and VR1 have a non-empty ! intersection. ! ! Benchmark compile/20001226-1.c compilation time after changing this ! function. ! */

Btw - are these comment changes a reminder for youself? I don't think they belong here (and above).

Thanks,
Richard.


  static inline bool
  value_ranges_intersect_p (value_range_t *vr0, value_range_t *vr1)



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]