[Bug tree-optimization/69760] [4.9/5/6 Regression] Wrong 64-bit memory address caused by an unneeded overflowing 32-bit integer multiplication on x86_64 under -O2 and -O3 code optimization

Wed Feb 24 11:14:00 GMT 2016

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69760

--- Comment #9 from amker at gcc dot gnu.org ---
(In reply to Richard Biener from comment #8)
> Patch in comment #6 bootstrapped and tested ok.
> 
> The following patch also fixes the issue:
> 
> Index: gcc/tree-chrec.c
> ===================================================================
> --- gcc/tree-chrec.c    (revision 233634)
> +++ gcc/tree-chrec.c    (working copy)
> @@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.
>  #include "coretypes.h"
>  #include "backend.h"
>  #include "tree.h"
> +#include "gimple.h"
>  #include "gimple-expr.h"
>  #include "tree-pretty-print.h"
>  #include "fold-const.h"
> @@ -1196,6 +1197,12 @@ convert_affine_scev (struct loop *loop,
>         happen.  */
>    must_check_src_overflow = TYPE_PRECISION (ct) < TYPE_PRECISION (type);
>  
> +  /* If at_stmt is not always executed we cannot use overflow semantics
> +     as that might end up generating a CHREC that overflows.  */
> +  if (! at_stmt
> +      || ! dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb
> (at_stmt)))
> +    use_overflow_semantics = false;
> +
>    enforce_overflow_semantics = (use_overflow_semantics
>                                 && nowrap_type_p (type));
>    if (enforce_overflow_semantics)
> 
> note that we still generate the questionable multiplication on the loop latch
> so VRP might mess up things there:
> 
>   <bb 4>:
>   _21 = -m_6(D);
>   ivtmp.6_22 = (unsigned int) _21;
>   _27 = (unsigned int) L_12(D);
>   _30 = -m_6(D);
>   _31 = L_12(D) * _30;
>   ivtmp.7_29 = (unsigned int) _31;
>   _34 = (unsigned int) N_5(D);
>   _35 = (unsigned int) m_6(D);
>   _36 = _34 - _35;
> 
>   <bb 5>:
>   # ivtmp.6_26 = PHI <ivtmp.6_22(4), ivtmp.6_25(9)>
>   # ivtmp.7_3 = PHI <ivtmp.7_29(4), ivtmp.7_1(9)>
>   k_32 = (int) ivtmp.6_26;
>   _8 = k_32 >= 0;
>   k_33 = (int) ivtmp.6_26;
>   _10 = k_33 < n_9(D);
>   _11 = _8 & _10;
>   if (_11 != 0)
>     goto <bb 6>;
>   else
>     goto <bb 7>;
> 
>   <bb 6>:
>   _13 = (int) ivtmp.7_3;
>   _14 = (long unsigned int) _13;
>   _15 = _14 * 8;
>   _17 = a_16(D) + _15;
>   *_17 = 0.0;
> 
>   <bb 7>:
>   ivtmp.6_25 = ivtmp.6_26 + 1;
>   ivtmp.7_1 = ivtmp.7_3 + _27;
>   if (ivtmp.6_25 != _36)
> 
> but the IV at least works ok and we'd only need to make sure to rewrite
> the initial value compute into unsigned arithmetic.  The CHREC is
> still questionable, it's still {-m_6(D) * L_12(D), +, L_12(D)}_1
> which means it overflows during the loop evolution.  But at least that
> overflowing is preserved with the widening conversion:
> (long unsigned int) {-m_6(D) * L_12(D), +, L_12(D)}_1
> 
> Note that instantiate_scev doesn't pass a stmt context and thus we have
> to assume the worst here (otherwise it re-does the invalid transform
> for this testcase).  Eventually we'd simply need to do the unsigned
> re-write during the initial SCEV analysis to avoid the need to mess
> with use_overflow_semantics.  Thus when analyzing the multiply on a
> not always executed stmt create (int)((unsigned)(-m) * (unsigned)L).
> 
> The above patch causes at least
> 
> FAIL: gcc.dg/tree-ssa/pr21959.c scan-tree-dump-times vrp1 "Folding predicate
> " 0
> FAIL: gcc.dg/tree-ssa/scev-6.c (test for excess errors)
> FAIL: gcc.dg/tree-ssa/scev-6.c execution test
> FAIL: gcc.dg/tree-ssa/scev-6.c scan-tree-dump-not optimized "abort"
> FAIL: gcc.dg/tree-ssa/scev-cast.c (test for excess errors)
> 
> and a load of vect.exp FAILs (also execute fails ...)

I can understand these new failures are because the domination check itself is
too conservative.  Ideally, we should only make such check when the
precondition does affect overflowness.  I am thinking if we can do some
sophisticated analysis on site with the help of vrp. Then we can mark it in
some way that this iv can not be used to rewrite others.

> 
> I'm going to investigate the effects of re-writing to unsigned during SCEV
> analysis now.