This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Fix PR81082


The mitigation for PR66313 (bogus re-association causing undefined
overflow) causes missed optimizations because the fix was to perform
possibly dangerous transforms in unsigned arithmetic.  Unfortunately
this loses knowledge of no-overflow which means SCEV analysis when
facing the unsigned computations has to conclude IVs may wrap and thus
are not affine when used in address context.

The following fix for this issue tries to delay dangerous transforms
instead to a point where range info may allow them without resorting
to unsigned arithmetic.  So compared to the original fix we do less
foldings (unless proved safe late).  In the PR I also suggested to
do a lowering during late GIMPLE to RTL semantics (-fwrapv) where
all such transforms would become safe and also reassoc could handle
signed arithmetic freely.  This isn't something for GCC 8 though
(and in its own naturally didn't help the PR), also because it requires
some pass shuffling in the late pipeline.  I'll revisit this for GCC 9.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2018-01-25  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/81082
	* fold-const.c (fold_plusminus_mult_expr): Do not perform the
	association if it requires casting to unsigned.
	* match.pd ((A * C) +- (B * C) -> (A+-B)): New patterns derived
	from fold_plusminus_mult_expr to catch important cases late when
	range info is available.

	* gcc.dg/vect/pr81082.c: New testcase.

Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	(revision 257047)
+++ gcc/fold-const.c	(working copy)
@@ -7097,7 +7097,7 @@ fold_plusminus_mult_expr (location_t loc
 
   /* Same may be zero and thus the operation 'code' may overflow.  Likewise
      same may be minus one and thus the multiplication may overflow.  Perform
-     the operations in an unsigned type.  */
+     the sum operation in an unsigned type.  */
   tree utype = unsigned_type_for (type);
   tree tem = fold_build2_loc (loc, code, utype,
 			      fold_convert_loc (loc, utype, alt0),
@@ -7110,9 +7110,9 @@ fold_plusminus_mult_expr (location_t loc
     return fold_build2_loc (loc, MULT_EXPR, type,
 			    fold_convert (type, tem), same);
 
-  return fold_convert_loc (loc, type,
-			   fold_build2_loc (loc, MULT_EXPR, utype, tem,
-					    fold_convert_loc (loc, utype, same)));
+  /* Do not resort to unsigned multiplication because
+     we lose the no-overflow property of the expression.  */
+  return NULL_TREE;
 }
 
 /* Subroutine of native_encode_expr.  Encode the INTEGER_CST
Index: gcc/match.pd
===================================================================
--- gcc/match.pd	(revision 257047)
+++ gcc/match.pd	(working copy)
@@ -1939,6 +1939,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
      (minus (convert (view_convert:stype @1))
 	    (convert (view_convert:stype @2)))))))
 
+/* (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A -> A * (C+-1).
+    Modeled after fold_plusminus_mult_expr.  */
+(if (!TYPE_SATURATING (type)
+     && (!FLOAT_TYPE_P (type) || flag_associative_math))
+ (for plusminus (plus minus)
+  (simplify
+   (plusminus (mult:s @0 @1) (mult:cs @0 @2))
+   (if (!ANY_INTEGRAL_TYPE_P (type)
+        || TYPE_OVERFLOW_WRAPS (type)
+        || (INTEGRAL_TYPE_P (type)
+	    && tree_expr_nonzero_p (@0)
+	    && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)))))
+    (mult (plusminus @1 @2) @0)))
+  /* We cannot generate constant 1 for fract.  */
+  (if (!ALL_FRACT_MODE_P (TYPE_MODE (type)))
+   (simplify
+    (plusminus @0 (mult:cs @0 @2))
+    (if (!ANY_INTEGRAL_TYPE_P (type)
+	 || TYPE_OVERFLOW_WRAPS (type)
+	 || (INTEGRAL_TYPE_P (type)
+	     && tree_expr_nonzero_p (@0)
+	     && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)))))
+     (mult (plusminus { build_one_cst (type); } @2) @0)))
+   (simplify
+    (plusminus (mult:cs @0 @2) @0)
+    (if (!ANY_INTEGRAL_TYPE_P (type)
+	 || TYPE_OVERFLOW_WRAPS (type)
+	 || (INTEGRAL_TYPE_P (type)
+	     && tree_expr_nonzero_p (@0)
+	     && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)))))
+     (mult (plusminus @2 { build_one_cst (type); }) @0))))))
 
 /* Simplifications of MIN_EXPR, MAX_EXPR, fmin() and fmax().  */
 
Index: gcc/testsuite/gcc.dg/vect/pr81082.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/pr81082.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/vect/pr81082.c	(working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+int
+f (int *x, int b1, int b2, int b3)
+{
+  int foo = 0;
+  for (int i1 = 0; i1 < b1; ++i1)
+    for (int i2 = 0; i2 < b2; ++i2)
+      for (int i3 = 0; i3 < b3; ++i3)
+	foo += x[i1 * b2 * b3 + i2 * b3 + (i3 - 1)];
+  return foo;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]