This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: fold x ^ y to 0 if x == y


On Wed, 20 Jul 2016, Prathamesh Kulkarni wrote:

> On 20 July 2016 at 16:35, Richard Biener <rguenther@suse.de> wrote:
> > On Wed, 20 Jul 2016, Prathamesh Kulkarni wrote:
> >
> >> On 8 July 2016 at 12:29, Richard Biener <rguenther@suse.de> wrote:
> >> > On Fri, 8 Jul 2016, Richard Biener wrote:
> >> >
> >> >> On Fri, 8 Jul 2016, Prathamesh Kulkarni wrote:
> >> >>
> >> >> > Hi Richard,
> >> >> > For the following test-case:
> >> >> >
> >> >> > int f(int x, int y)
> >> >> > {
> >> >> >    int ret;
> >> >> >
> >> >> >    if (x == y)
> >> >> >      ret = x ^ y;
> >> >> >    else
> >> >> >      ret = 1;
> >> >> >
> >> >> >    return ret;
> >> >> > }
> >> >> >
> >> >> > I was wondering if x ^ y should be folded to 0 since
> >> >> > it's guarded by condition x == y ?
> >> >> >
> >> >> > optimized dump shows:
> >> >> > f (int x, int y)
> >> >> > {
> >> >> >   int iftmp.0_1;
> >> >> >   int iftmp.0_4;
> >> >> >
> >> >> >   <bb 2>:
> >> >> >   if (x_2(D) == y_3(D))
> >> >> >     goto <bb 3>;
> >> >> >   else
> >> >> >     goto <bb 4>;
> >> >> >
> >> >> >   <bb 3>:
> >> >> >   iftmp.0_4 = x_2(D) ^ y_3(D);
> >> >> >
> >> >> >   <bb 4>:
> >> >> >   # iftmp.0_1 = PHI <iftmp.0_4(3), 1(2)>
> >> >> >   return iftmp.0_1;
> >> >> >
> >> >> > }
> >> >> >
> >> >> > The attached patch tries to fold for above case.
> >> >> > I am checking if op0 and op1 are equal using:
> >> >> > if (bitmap_intersect_p (vr1->equiv, vr2->equiv)
> >> >> >    && operand_equal_p (vr1->min, vr1->max)
> >> >> >    && operand_equal_p (vr2->min, vr2->max))
> >> >> >   { /* equal /* }
> >> >> >
> >> >> > I suppose intersection would check if op0 and op1 have equivalent ranges,
> >> >> > and added operand_equal_p check to ensure that there is only one
> >> >> > element within the range. Does that look correct ?
> >> >> > Bootstrap+test in progress on x86_64-unknown-linux-gnu.
> >> >>
> >> >> I think VRP is the wrong place to catch this and DOM should have but it
> >> >> does
> >> >>
> >> >> Optimizing block #3
> >> >>
> >> >> 1>>> STMT 1 = x_2(D) le_expr y_3(D)
> >> >> 1>>> STMT 1 = x_2(D) ge_expr y_3(D)
> >> >> 1>>> STMT 1 = x_2(D) eq_expr y_3(D)
> >> >> 1>>> STMT 0 = x_2(D) ne_expr y_3(D)
> >> >> 0>>> COPY x_2(D) = y_3(D)
> >> >> 0>>> COPY y_3(D) = x_2(D)
> >> >> Optimizing statement ret_4 = x_2(D) ^ y_3(D);
> >> >>   Replaced 'x_2(D)' with variable 'y_3(D)'
> >> >>   Replaced 'y_3(D)' with variable 'x_2(D)'
> >> >>   Folded to: ret_4 = x_2(D) ^ y_3(D);
> >> >> LKUP STMT ret_4 = x_2(D) bit_xor_expr y_3(D)
> >> >>
> >> >> heh, registering both reqivalencies is obviously not going to help...
> >> >>
> >> >> The 2nd equivalence is from doing
> >> >>
> >> >>       /* We already recorded that LHS = RHS, with canonicalization,
> >> >>          value chain following, etc.
> >> >>
> >> >>          We also want to record RHS = LHS, but without any
> >> >> canonicalization
> >> >>          or value chain following.  */
> >> >>       if (TREE_CODE (rhs) == SSA_NAME)
> >> >>         const_and_copies->record_const_or_copy_raw (rhs, lhs,
> >> >>                                                     SSA_NAME_VALUE (rhs));
> >> >>
> >> >> generally recording both is not helpful.  Jeff?  This seems to be
> >> >> r233207 (fix for PR65917) which must have regressed this testcase.
> >> >
> >> > Just verified it works fine on the GCC 5 branch:
> >> >
> >> > Optimizing block #3
> >> >
> >> > 0>>> COPY y_3(D) = x_2(D)
> >> > 1>>> STMT 1 = x_2(D) le_expr y_3(D)
> >> > 1>>> STMT 1 = x_2(D) ge_expr y_3(D)
> >> > 1>>> STMT 1 = x_2(D) eq_expr y_3(D)
> >> > 1>>> STMT 0 = x_2(D) ne_expr y_3(D)
> >> > Optimizing statement ret_4 = x_2(D) ^ y_3(D);
> >> >   Replaced 'y_3(D)' with variable 'x_2(D)'
> >> > Applying pattern match.pd:240, gimple-match.c:11346
> >> > gimple_simplified to ret_4 = 0;
> >> >   Folded to: ret_4 = 0;
> >> I have reported it as PR71947.
> >> Could you help me point out how to fix this ?
> >
> > Not record both equivalences.  This might break the testcase it was
> > introduced for (obviously).  Which is why I CCed Jeff for his opinion.
> Well, folding happens for x - y, if x == y.
> 
> int f(int x, int y)
> {
>   int ret;
>   if (x == y)
>     ret = x - y;
>   else
>     ret = 1;
> 
>   return ret;
> }
> 
> For the above test-case, extract_range_from_binary_expr_1()
> determines that range of ret = [0, 0]
> and propagates it.
> 
> vrp1 dump:
> f (int x, int y)
> {
>   int ret;
> 
>   <bb 2>:
>   if (x_2(D) == y_3(D))
>     goto <bb 3>;
>   else
>     goto <bb 4>;
> 
>   <bb 3>:
>   ret_4 = x_2(D) - y_3(D);
> 
>   <bb 4>:
>   # ret_1 = PHI <0(3), 10(2)>
>   return ret_1;
> 
> }
> 
> Then the dce pass removes ret_4 = x_2(D) - y_3(D) since it's
> redundant.
> However it appears vrp fails to notice the equality for the following test-case,
> and sets range for ret to VARYING.
> 
> int f(int x, int y, int a, int b)
> {
>   int ret = 10;
>   if (a == x
>       && b == y
>       && a == b)
>     ret = x - y;
> 
>   return ret;
> }
> 
> Looking at the vrp dump, shows the following form
> after inserting ASSERT_EXPR:
> 
> SSA form after inserting ASSERT_EXPRs
> f (int x, int y, int a, int b)
> {
>   int ret;
>   _Bool _1;
>   _Bool _2;
>   _Bool _3;
> 
>   <bb 2>:
>   _1 = a_5(D) == x_6(D);
>   _2 = b_7(D) == y_8(D);
>   _3 = _1 & _2;
>   if (_3 != 0)
>     goto <bb 3>;
>   else
>     goto <bb 5>;
> 
>   <bb 3>:
>   a_11 = ASSERT_EXPR <a_5(D), a_5(D) == x_6(D)>;
>   x_12 = ASSERT_EXPR <x_6(D), x_6(D) == a_11>;
>   b_13 = ASSERT_EXPR <b_7(D), b_7(D) == y_8(D)>;
>   y_14 = ASSERT_EXPR <y_8(D), y_8(D) == b_13>;
>   if (a_11 == b_13)
>     goto <bb 4>;
>   else
>     goto <bb 5>;
> 
>   <bb 4>:
>   ret_9 = x_12 ^ y_14;
> 
>   <bb 5>:
>   # ret_4 = PHI <10(2), 10(3), ret_9(4)>
>   return ret_4;
> 
> }
> 
> Shouldn't there also be a range assertion for a_11 == b_13 ?

Representing equality with asserts is difficult and what VRP currently
does is really a hack (it needs to record both directions to make
the SSA rewriter re-write both names).

a_11 == b_13 is missing because there is no use of either name
in the if/else arm.  Yes, with that equality known we could
improve the ranges for x and y but this is not how VRP works.

DOM-based VRP with aggressive back-propagation would handle this.

> Should we modify extract_range_from_binary_expr_1 to handle
> this case for bit_xor_expr so similar to minus_expr case,
> the range [0, 0] would get propagated and make ret = x ^ y redundant
> which will then be removed by dce pass ?

Sure enhancing VRP is good, fixing the DOM regression is requally
important though.

+         /* Set range 0 for r, where r = op0 ^ op1 and op0 equals op1.  
*/
+         /* FIXME: I am using bitmap_intersect_p() to determine if vr0
+            and vr1 are equivalent and operand_equal_p() to ensure
+            that range has only one symbol. Is this correct ?.  */
+         if (TREE_CODE (vr0.min) == SSA_NAME
+             && TREE_CODE (vr0.max) == SSA_NAME
+             && TREE_CODE (vr1.min) == SSA_NAME
+             && TREE_CODE (vr1.max) == SSA_NAME
+             && vr0.equiv && vr1.equiv
+             && bitmap_intersect_p (vr0.equiv, vr1.equiv)
+             && operand_equal_p (vr0.min, vr0.max, 0))
+           max = min = build_int_cst (expr_type, 0);

Similar to DOM the question is why this is not caught during
vrp_visit_assignment_or_call call to gimple_fold_stmt_to_constant_1.

And I think the reason is similar to that of DOM - we valueize
x_6 to y_3(D) and y_7 to x_6.

This also hints at ASSERT_EXPR range extraction to be improved:

Visiting statement:
x_6 = ASSERT_EXPR <x_2(D), x_2(D) == y_3(D)>;
Intersecting
  [y_3(D), y_3(D)]  EQUIVALENCES: { x_2(D) y_3(D) } (2 elements)
and
  VARYING
to
  [y_3(D), y_3(D)]  EQUIVALENCES: { x_2(D) y_3(D) } (2 elements)
Found new range for x_6: [y_3(D), y_3(D)]

good.

Visiting statement:
y_7 = ASSERT_EXPR <y_3(D), y_3(D) == x_6>;
Intersecting
  [x_6, x_6]  EQUIVALENCES: { x_2(D) y_3(D) x_6 } (3 elements)
and
  VARYING
to
  [x_6, x_6]  EQUIVALENCES: { x_2(D) y_3(D) x_6 } (3 elements)
Found new range for y_7: [x_6, x_6]

that's less than ideal - we choose x_6 as the new range without
looking at its value-range it seems.

Thus, the following untested patch fixes the missing optimization in
VRP as well:

Index: gcc/tree-vrp.c
===================================================================
--- gcc/tree-vrp.c      (revision 238469)
+++ gcc/tree-vrp.c      (working copy)
@@ -1513,10 +1514,13 @@ extract_range_from_assert (value_range *
   limit_vr = (TREE_CODE (limit) == SSA_NAME) ? get_value_range (limit) : 
NULL;
 
   /* LIMIT's range is only interesting if it has any useful information.  
*/
-  if (limit_vr
-      && (limit_vr->type == VR_UNDEFINED
-         || limit_vr->type == VR_VARYING
-         || symbolic_range_p (limit_vr)))
+  if (! limit_vr
+      || limit_vr->type == VR_UNDEFINED
+      || limit_vr->type == VR_VARYING
+      || (symbolic_range_p (limit_vr)
+         && ! (limit_vr->type == VR_RANGE
+               && (limit_vr->min == limit_vr->max
+                   || operand_equal_p (limit_vr->min, limit_vr->max, 
0)))))
     limit_vr = NULL;
 
   /* Initially, the new range has the same set of equivalences of

I'll give it some testing today.

Richard.

> I have attached untested patch for that.
> Does it look OK ?
> (It also doesn't handle  a == x && b == y && a == b case).
> 
> Thanks,
> Prathamesh
> >
> > Richard.
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]