The following testcase is miscompiled at -O3 on x86_64-linux starting with http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192989 (ok, that revision doesn't compile, needs r192900 follow-up). struct A { short int a1; unsigned char a2; unsigned int a3; }; struct B { unsigned short b1; const A *b2; }; B b; __attribute__((noinline, noclone)) int foo (unsigned x) { __asm volatile ("" : "+r" (x) : : "memory"); return x; } inline void bar (const int &) { } __attribute__((noinline)) void baz () { const A *a = b.b2; unsigned int i; unsigned short n = b.b1; for (i = 0; i < n; ++i) if (a[i].a1 == 11) { if (i > 0 && (a[i - 1].a2 & 1)) continue; bar (foo (2)); return; } } int main () { A a[4] = { { 10, 0, 0 }, { 11, 1, 0 }, { 11, 1, 0 }, { 11, 1, 0 } }; b.b1 = 4; b.b2 = a; baz (); return 0; } In *.slp we have: <bb 5>: if (i_21 != 0) goto <bb 6>; else goto <bb 7>; <bb 6>: _11 = i_21 + 4294967295; _12 = (long unsigned int) _11; _13 = _12 * 8; _14 = a_4 + _13; _15 = _14->a2; but ivopts turns that into: <bb 5>: i_25 = (unsigned int) ivtmp.9_31; if (i_25 != 0) goto <bb 6>; else goto <bb 7>; <bb 6>: _28 = ivtmp.9_31 * 8; _27 = a_4 + _28; _26 = _27 + 34359738362; _15 = MEM[base: _26, offset: 0B]; which is wrong, i_21 + -1U wrapped around (and wasn't executed for i_21 being 0), while 34359738362 is 0xffffffffULL * 8 + 2, thus it ignores the wrapping and does what the original code would do for _12 = (long unsigned int) i_21; _77 = _12 + 4294967295; _13 = _77 * 8; i.e. as if the -1U addition was done in the wider precision.

Confirmed.

ssa name _12 type long unsigned int base 4294967295 step 1 ssa name _13 type long unsigned int base 34359738360 step 8 ssa name _14 type const struct A * base a_4 + 34359738360 step 8 base object (void *) a_4 all look wrong to me, it ignores the wrapping in the narrower type. OT, I wonder why number of iterations analysis doesn't figure out smaller number: Analyzing # of iterations of loop 1 exit condition [1, + , 1] < (unsigned int) n_5 bounds on difference of bases: 0 ... 4294967294 result: # of iterations (unsigned int) n_5 + 4294967295, bounded by 4294967294 The loop condition is: i_18 = i_21 + 1; if (i_18 < _20) and _20 is: _20 = (unsigned int) n_5; where short unsigned int n; thus, I'd thought we should figure out from that that i_18 will be at most 0xffff (highest possible n_5 is 0xffff). But that is 4.9 material likely, while the ivopts? bug is a severe 4.8 issue.

OK, the problem seems to be already in what simple_iv returns for SSA name 12. Here we should have -1. While analyzing the cast (gdb) p debug_gimple_stmt (at_stmt) _12 = (long unsigned int) _11; that effectively changes addition to minus, we get to CASE_CONVERT: /* In case we have a truncation of a widened operation that in the truncated type has undefined overflow behavior analyze the operation done in an unsigned type of the same precision as the final truncation. We cannot derive a scalar evolution for the widened operation but for the truncated result. */ if (TREE_CODE (type) == INTEGER_TYPE && TREE_CODE (TREE_TYPE (rhs1)) == INTEGER_TYPE && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (rhs1)) && TYPE_OVERFLOW_UNDEFINED (type) && TREE_CODE (rhs1) == SSA_NAME && (def = SSA_NAME_DEF_STMT (rhs1)) && is_gimple_assign (def) && TREE_CODE_CLASS (gimple_assign_rhs_code (def)) == tcc_binary && TREE_CODE (gimple_assign_rhs2 (def)) == INTEGER_CST) { tree utype = unsigned_type_for (type); chrec1 = interpret_rhs_expr (loop, at_stmt, utype, gimple_assign_rhs1 (def), gimple_assign_rhs_code (def), gimple_assign_rhs2 (def)); } else chrec1 = analyze_scalar_evolution (loop, rhs1); Here for widening conversions we do nothing. So we get <polynomial_chrec 0x7ffff705d7e0 type <integer_type 0x7ffff6ee8690 unsigned int sizes-gimplified public unsigned SI size <integer_cst 0x7ffff6eeb140 constant 32> unit size <integer_cst 0x7ffff6eeb160 constant 4> align 32 symtab 0 alias set -1 canonical type 0x7ffff6ee8690 precision 32 min <integer_cst 0x7ffff6eeb180 0> max <integer_cst 0x7ffff6eeb120 4294967295> pointer_to_this <pointer_type 0x7ffff6fc9738>> arg 0 <integer_cst 0x7ffff6eeb3c0 type <integer_type 0x7ffff6ee85e8 int> constant 1> arg 1 <integer_cst 0x7ffff6eeb120 type <integer_type 0x7ffff6ee8690 unsigned int> constant 4294967295> arg 2 <integer_cst 0x7ffff7042ae0 type <integer_type 0x7ffff6ee8690 unsigned int> constant 1>> this is correct, since it is done in unsigned int. Next we do: res = chrec_convert (type, chrec1, at_stmt); Eventually we go to convert_affine_scev and we set enforce_overflow_semantics enforce_overflow_semantics = (use_overflow_semantics && nowrap_type_p (type)); This test looks backwards to me. If the type is nowrap, we do not need to enforce anything about overflows, when it is wrap, then we have to. Flipping it however do not change fix the testcase. Anyway, the result of convert_affince_scev makes us to produce <polynomial_chrec 0x7ffff705d810 type <integer_type 0x7ffff6ee87e0 long unsigned int public unsigned DI size <integer_cst 0x7ffff6ecddc0 constant 64> unit size <integer_cst 0x7ffff6ecdde0 constant 8> align 64 symtab 0 alias set -1 canonical type 0x7ffff6ee87e0 precision 64 min <integer_cst 0x7ffff6eeb200 0> max <integer_cst 0x7ffff6eeb1e0 18446744073709551615>> arg 0 <integer_cst 0x7ffff6eeb3c0 type <integer_type 0x7ffff6ee85e8 int> constant 1> arg 1 <integer_cst 0x7ffff7042de0 type <integer_type 0x7ffff6ee87e0 long unsigned int> constant 4294967295> arg 2 <integer_cst 0x7ffff704cba0 type <integer_type 0x7ffff6ee87e0 long unsigned int> constant 1>> that seems already wrong. So the bug seems to be that convert_affince_scev is not doing the right thing here? But what the right thing is? Conversion to -1 or giving up?

> this is correct, since it is done in unsigned int. > Next we do: > res = chrec_convert (type, chrec1, at_stmt); > Eventually we go to convert_affine_scev and we set enforce_overflow_semantics > enforce_overflow_semantics = (use_overflow_semantics > && nowrap_type_p (type)); > > This test looks backwards to me. If the type is nowrap, we do not need to > enforce anything about overflows, when it is wrap, then we have to. The test is correct -- here, type is the type we are converting to; so, if the type is nowrap, we have to make sure that the conversion is not creating an overflowing iv. Anyway, that is irrelevant to the problem: the check of scev_probably_wraps_p below should return false -- which is probably what got broken by the #of iterations estimation change. Zdenek

(In reply to comment #4) > the check of scev_probably_wraps_p below should return false this should be "should return true"

OK, I understnad the issue now. It is bug caused by my patch indeed. The problem is logic in scev_probably_wraps_p that is trying to prove that given IV at given STMT is not wrapping based on loop bounds connected. When I was extending loop bounds to contain not only statements that dominate the exit BB, I verified the walkers that they are valid after the change. In this case it is however not true. What I missed is that it does two things 1) it tries to prove that STMT is bounded by given bound based on fact that bound's STMT dominate the statement 2) it tries to prove the bound based on number of iterations of loop that it derrives from the bounds 2) needs to be updated. I am testing patch.

Created attachment 29106 [details] patch in testing.

Created attachment 29109 [details] updated patch There is another bug triggered by this testcase. Some of the bounds, like those derrived by if (iv_var == constant) can not be used when they are not executed every iteration. This patch prevents them from being recorded as bounds.

FYI, the execute/pr55875.c with -O1 started failing with http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=138207 aka tuples merge.

Author: hubicka Date: Wed Jan 9 15:10:43 2013 New Revision: 195054 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=195054 Log: PR tree-optimiation/55875 * gcc.c-torture/execute/pr55875.c: New testcase. * g++.dg/torture/pr55875.C: New testcase. * tree-ssa-loop-niter.c (number_of_iterations_cond): Add EVERY_ITERATION parameter. (number_of_iterations_exit): Check if exit is executed every iteration. (idx_infer_loop_bounds): Similarly here. (n_of_executions_at_most): Simplify to only test for cases where statement is dominated by the particular bound; handle correctly the "postdominance" test. (scev_probably_wraps_p): Use max loop iterations info as a global bound first. Added: trunk/gcc/testsuite/g++.dg/torture/pr55875.C trunk/gcc/testsuite/gcc.c-torture/execute/pr55875.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-loop-niter.c

Fixed, thanks.

Shall we track the C testcase regression in 4.7 and earlier? Honza

Yes, but I'd say under a different PR.