This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/81082] [8 Regression] Failure to vectorise after reassociating index computation


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81082

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
So moving the transform to match.pd will only have an effect in late VRP
given we need loop header copying to derive a range for the bNs.

The following is what I've done, remove the problematic foldings from
fold-const.c
and re-instantiate the full transform (but not the factoring of power of twos)
in match.pd.  It vectorizes this testcase again and restores Himeno performance
(PR81554), dump differences are too big to see what helps but I can say
the match.pd patterns mostly apply via generic (SCEV/IVOPTs) and there's
two late applies in VRP2 and forwprop4 on GIMPLE.  Note that all but one
match the a*b + a cases, the single a*b + a*c case is from forwprop4.
I'm quite sure the patterns are confused by association, say,
a*(b*c) + a*b, where for signed arithmetic reassoc doesn't "fix" this up.
But the fold-const.c implementation has exactly the same issue.

Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c    (revision 256977)
+++ gcc/fold-const.c    (working copy)
@@ -7095,24 +7095,7 @@ fold_plusminus_mult_expr (location_t loc
                                     fold_convert_loc (loc, type, alt1)),
                        fold_convert_loc (loc, type, same));

-  /* Same may be zero and thus the operation 'code' may overflow.  Likewise
-     same may be minus one and thus the multiplication may overflow.  Perform
-     the operations in an unsigned type.  */
-  tree utype = unsigned_type_for (type);
-  tree tem = fold_build2_loc (loc, code, utype,
-                             fold_convert_loc (loc, utype, alt0),
-                             fold_convert_loc (loc, utype, alt1));
-  /* If the sum evaluated to a constant that is not -INF the multiplication
-     cannot overflow.  */
-  if (TREE_CODE (tem) == INTEGER_CST
-      && (wi::to_wide (tem)
-         != wi::min_value (TYPE_PRECISION (utype), SIGNED)))
-    return fold_build2_loc (loc, MULT_EXPR, type,
-                           fold_convert (type, tem), same);
-
-  return fold_convert_loc (loc, type,
-                          fold_build2_loc (loc, MULT_EXPR, utype, tem,
-                                           fold_convert_loc (loc, utype,
same)));
+  return NULL_TREE;
 }

 /* Subroutine of native_encode_expr.  Encode the INTEGER_CST
Index: gcc/match.pd
===================================================================
--- gcc/match.pd        (revision 256977)
+++ gcc/match.pd        (working copy)
@@ -4617,3 +4617,29 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
        || wi::geu_p (wi::to_wide (@rpos),
                      wi::to_wide (@ipos) + isize))
     (BIT_FIELD_REF @0 @rsize @rpos)))))
+
+(for plusminus (plus minus)
+ (simplify
+  (plusminus (mult @0 @1) (mult:c @0 @2))
+  (if (! INTEGRAL_TYPE_P (type)
+       || TYPE_OVERFLOW_WRAPS (type)
+       || (tree_expr_nonzero_p (@0)
+          && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)))))
+   (mult (plusminus @1 @2) @0)))
+ (simplify
+  (plusminus @0 (mult:c @0 @2))
+  /* We cannot generate constant 1 for fract.  */
+  (if (! ALL_FRACT_MODE_P (TYPE_MODE (type))
+       && (! INTEGRAL_TYPE_P (type)
+          || TYPE_OVERFLOW_WRAPS (type)
+          || (tree_expr_nonzero_p (@0)
+              && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION
(type))))))
+   (mult (plusminus { build_one_cst (type); } @2) @0)))
+ (simplify
+  (plusminus (mult:c @0 @2) @0)
+  (if (! ALL_FRACT_MODE_P (TYPE_MODE (type))
+       && (! INTEGRAL_TYPE_P (type)
+          || TYPE_OVERFLOW_WRAPS (type)
+          || (tree_expr_nonzero_p (@0)
+              && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION
(type))))))
+   (mult (plusminus @2 { build_one_cst (type); }) @0))))


Now testing the patch looking for testsuite fallout.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]