[Bug tree-optimization/69760] [4.9/5/6 Regression] Wrong 64-bit memory address caused by an unneeded overflowing 32-bit integer multiplication on x86_64 under -O2 and -O3 code optimization

Tue Feb 23 10:07:00 GMT 2016

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69760

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |spop at gcc dot gnu.org

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
So I believe the issue creeps in at SCEV time already.

We go from the BIV i with { 0, +, 1 }_1 via { -m, +, 1 }_1 for i - m to
{ -m * L, +, L } for the index of a[].  All in type int.  The initial
values for those derived IVs can may have undefined overflow behavior
(and for this testcase they do).  The issue is we create an IV for a
conditionally computed expression.

Things go really wrong when analyzing the derived IVs for the inner
reference of a[L*k] where we get to
{ (unsigned long) &a + 8 * (unsigned long) (-m * L), +, 8 * (unsigned long) L
}_1

We could recover from this very specific case by recognizing that when
analyzing (unsigned long) ( X * Y ) with a multiplication that may not overflow
we can analyze it as (unsigned long)((long) X * (long) Y).

But that doesn't fix the fact that the IVs in int type above are wrong as well
- we're just waiting for new interesting testcases to notice.

We'd need to re-write the IVs into unsigned arithmetic which then would make
us no longer be able to compute a polynomial chrec for &a[L*k] as the
multiplication then may wrap.  This is what you get from using -fwrapv or
unsigned i/k.  We only get as far as {-_6 * (unsigned int) L_12(D), +,
(unsigned int) L_12(D)}_1)

But doing this unconditionally really is going to pessimize SCEV and depending
optimizations (not checked).  Doing this conditionally on whether the IV
is computed at a place that is not known to be always executed in the loop
being analyzed would likely introduce other issues as well given we compare
IVs for structual equality when doing dependence analysis, so conditional
a[i*2] vs. unconditional a[i*2] would no longer be handled.

So the following is the minimal "fix" noted above

Index: gcc/tree-scalar-evolution.c
===================================================================
--- gcc/tree-scalar-evolution.c (revision 233620)
+++ gcc/tree-scalar-evolution.c (working copy)
@@ -1883,6 +1883,27 @@ interpret_rhs_expr (struct loop *loop, g
                                       gimple_assign_rhs_code (def),
                                       gimple_assign_rhs2 (def));
        }
+      /* In case we have a widening of an operation with undefined
+         overflow behavior analyze the operation done in the wide
+        type with undefined overflow behavior.  This avoids undefined
+        behavior if the undefined operation would overflow when
+        being computed not at at_stmt which might be executed only
+        conditionally.  See PR69760.  */
+      else if (TREE_CODE (TREE_TYPE (rhs1)) == INTEGER_TYPE
+              && TREE_CODE (TREE_TYPE (rhs1)) == INTEGER_TYPE
+              && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (rhs1))
+              && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (rhs1))
+              && TREE_CODE (rhs1) == SSA_NAME
+              && (def = SSA_NAME_DEF_STMT (rhs1))
+              && is_gimple_assign (def)
+              && TREE_CODE_CLASS (gimple_assign_rhs_code (def)) == tcc_binary)
+       {
+         tree stype = signed_type_for (type);
+         chrec1 = interpret_rhs_expr (loop, at_stmt, stype,
+                                      gimple_assign_rhs1 (def),
+                                      gimple_assign_rhs_code (def),
+                                      gimple_assign_rhs2 (def));
+       }
       else
        chrec1 = analyze_scalar_evolution (loop, rhs1);
       res = chrec_convert (type, chrec1, at_stmt);

I've done it unconditionally and not based on at_stmt place in the loop we
analyze to avoid the issue mentioned above.

Note this really papers over the fundamental issue with CHRECs computed for
conditionally executed expressions.

About a testcase for the testsuite I'm not sure if allocating 300MB is ok
even if only very few pages are going to be touched.  I suppose restricting
the testcase to *-linux targets might work?  Or run_expensive_tests ...