Created attachment 28845 [details] testcase The attached testcase is miscompiled at -O2, but runs fine at -O1. To reproduce gfortran -O2 bug.f90 ; ./a.out #3 0x400C7E in MAIN__ at bug.f90:97 Aborted I believe this is a recent regression on trunk. Tested with gcc version 4.8.0 20121201 (experimental) [trunk revision 194017] (GCC)
Using -O2 -fno-tree-pre fixes the testcase. Using -O1 -ftree-pre leads to an infinite loop at runtime.
Revision 192891 (2012-10-28) is OK. Revision 193261 (2012-11-06) is not.
It is caused by revision 193098: http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00045.html
Hmm, this seems to be caused by Forced statement unreachable: pretmp_516 = coef_x[pretmp_515]; Forced statement unreachable: pretmp_513 = coef_x[pretmp_512]; Forced statement unreachable: pretmp_479 = coef_x[pretmp_478]; I am not exactly fortran guru. Can someone double check that there are no out of bounds accesses into these? Honza
(In reply to comment #4) > Hmm, this seems to be caused by > Forced statement unreachable: pretmp_516 = coef_x[pretmp_515]; > Forced statement unreachable: pretmp_513 = coef_x[pretmp_512]; > Forced statement unreachable: pretmp_479 = coef_x[pretmp_478]; > > I am not exactly fortran guru. Can someone double check that there are no out > of bounds accesses into these? > > Honza I'm pretty sure there are no out-of-bounds. In particular coef_x is easy to check, it is only used as coef_x(:,lxp) where lxp is the loop bound 0..lp consistent with its def. Of course maybe the FE does something inconsistent ? Also this runs fine: fortran -O0 -fsanitize=address PR55555.f90 ; ./a.out
> I'm pretty sure there are no out-of-bounds. In particular coef_x is easy to > check, it is only used as coef_x(:,lxp) where lxp is the loop bound 0..lp > consistent with its def. Of course maybe the FE does something inconsistent ? > > Also this runs fine: > > fortran -O0 -fsanitize=address PR55555.f90 ; ./a.out Hmm, I saw similar weird cases generated by the frontend. coef_x is array of 8 elements real(kind=8) coef_x[8]; in loop analyzis we do: Induction variable (integer(kind=8)) 4 + 4 * iteration does not wrap in statement pretmp_516 = coef_x[pretmp_515]; in loop 4. Statement pretmp_516 = coef_x[pretmp_515]; is executed at most 0 (bounded by 0) + 1 times in loop 4. This is true, pretmp_512 would be 8 at the second iteration of the loop. later we conclude Loop 4 iterates 1 times. Loop 4 iterates at most 1 times. BB: 9, after_exit: 0 size: 0 # DEBUG lxp => lxp_4 size: 0 _137 = (integer(kind=8)) lxp_4; size: 1 _140 = _137 + pretmp_508; size: 1 _142 = *pol_x_141(D)[_140]; size: 1 _143 = _137 * 4; size: 1 _144 = _143 + -1; BB: 10, after_exit: 0 size: 1 _146 = S.25_279 + _144; size: 1 _150 = _142 * prephitmp_520; size: 1 _151 = _150 + prephitmp_517; size: 1 coef_x[_146] = _151; size: 1 S.25_153 = S.25_279 + 1; size: 1 ivtmp_162 = ivtmp_91 - 1; size: 2 if (ivtmp_162 == 0) goto bb12 or bb11 BB: 11, after_exit: 0 size: 1 pretmp_515 = _144 + S.25_153; size: 1 pretmp_516 = coef_x[pretmp_515]; size: 1 pretmp_518 = S.25_153 + -1; size: 1 pretmp_519 = s[pretmp_518]; BB: 12, after_exit: 0 size: 0 # DEBUG lxp => lxp_4 + 1 size: 1 ivtmp_109 = ivtmp_163 - 1; size: 2 if (ivtmp_109 == 0) goto bb 13 or exit BB: 13, after_exit: 1 size: 1 lxp_154 = lxp_4 + 1; size: 0 pretmp_506 = (integer(kind=8)) lxp_154; size: 1 pretmp_509 = pretmp_506 * 4; size: 1 pretmp_510 = pretmp_509 + -1; size: 1 pretmp_512 = pretmp_510 + 1; size: 1 pretmp_513 = coef_x[pretmp_512]; Unrolled loop 4 completely (duplicated 1 times). Exit condition of peeled iterations was eliminated. Last iteration exit edge was proved true. So the curious statements are in bb11. Adding unreachable calls makes CSE to eventually turn condition in the second copy of BB10 to always just to BB 12 that seem all right to me. Perhaps cascaded unrolling confuse some of the exits... Honza
I've tried to rewrite this as C, but managed to turn it into something that is miscompiled at a different spot. The fortran testcase starts having __builtin_unreachable () in it in the *.cunroll pass, this one already in *.cunrolli pass. Still, I believe it doesn't do any out of bounds access anywhere. -O2 on x86_64-linux. double s[4] = { 1.0, 2.0, 3.0, 4.0 }, pol_x[2] = { 5.0, 6.0 }; __attribute__((noinline)) int foo (void) { double coef_x[8] = { 0, 0, 0, 0, 0, 0, 0, 0 }; int lxp = 0; if (lxp <= 1) do { double t = pol_x[lxp]; long S; long l = lxp * 4L - 1; for (S = 1; S <= 4; S++) coef_x[S + l] = coef_x[S + l] + s[S - 1] * t; } while (lxp++ != 1); asm volatile ("" : : "r" (coef_x) : "memory"); for (lxp = 0; lxp < 8; lxp++) if (coef_x[lxp] != ((lxp & 3) + 1) * (5.0 + (lxp >= 4))) __builtin_abort (); return 1; } int main () { asm volatile ("" : : : "memory"); if (!foo ()) __builtin_abort (); return 0; } Works with r193067, fails with r193100, haven't tried to bisect exactly, but would guess this is r193098 again. For the outer loop it prints: Analyzing # of iterations of loop 1 exit condition [0, + , 1](no_overflow) != 1 bounds on difference of bases: 1 ... 1 result: # of iterations 1, bounded by 1 Loop 1 iterates 1 times. Loop 1 iterates at most 1 times. but that is wrong, the outer loop iterates exactly 2 times. <bb 3>: # lxp_1 = PHI <0(2), lxp_21(15)> ... <bb 6>: lxp_21 = lxp_1 + 1; if (lxp_1 != 1) goto <bb 15>; else goto <bb 7>; <bb 15>: goto <bb 3>; If it used lxp_21 in the condition, that would be correct, but it uses the previous value of the IV.
(In reply to comment #7) > I've tried to rewrite this as C, but managed to turn it into something that is > miscompiled at a different spot. The fortran testcase starts having > __builtin_unreachable () in it in the *.cunroll pass, this one already in > *.cunrolli pass. Still, I believe it doesn't do any out of bounds access > anywhere. -O2 on x86_64-linux. > > double s[4] = { 1.0, 2.0, 3.0, 4.0 }, pol_x[2] = { 5.0, 6.0 }; > > __attribute__((noinline)) int > foo (void) > { > double coef_x[8] = { 0, 0, 0, 0, 0, 0, 0, 0 }; > int lxp = 0; > if (lxp <= 1) > do > { > double t = pol_x[lxp]; > long S; > long l = lxp * 4L - 1; > for (S = 1; S <= 4; S++) > coef_x[S + l] = coef_x[S + l] + s[S - 1] * t; > } > while (lxp++ != 1); > asm volatile ("" : : "r" (coef_x) : "memory"); > for (lxp = 0; lxp < 8; lxp++) > if (coef_x[lxp] != ((lxp & 3) + 1) * (5.0 + (lxp >= 4))) > __builtin_abort (); > return 1; > } > > int > main () > { > asm volatile ("" : : : "memory"); > if (!foo ()) > __builtin_abort (); > return 0; > } > > Works with r193067, fails with r193100, haven't tried to bisect exactly, but > would guess this is r193098 again. For the outer loop it prints: > > Analyzing # of iterations of loop 1 > exit condition [0, + , 1](no_overflow) != 1 > bounds on difference of bases: 1 ... 1 > result: > # of iterations 1, bounded by 1 > Loop 1 iterates 1 times. > Loop 1 iterates at most 1 times. > > but that is wrong, the outer loop iterates exactly 2 times. That's latch block executions, so one latch block execution is correct.
The unrolling puts __builtin_unreachable ()s into the inner duplicated loops: First iteration, good: <bb 16>: # lxp_30 = PHI <0(2)> t_32 = pol_x[lxp_30]; _33 = (long int) lxp_30; _34 = _33 * 4; l_35 = _34 + -1; <bb 17>: # S_36 = PHI <1(16), S_45(18)> if (S_36 <= 4) goto <bb 18>; else goto <bb 19>; <bb 18>: _38 = S_36 + l_35; _39 = coef_x[_38]; _40 = S_36 + -1; _41 = s[_40]; _42 = _41 * t_32; _43 = _39 + _42; coef_x[_38] = _43; S_45 = S_36 + 1; goto <bb 17>; <bb 19>: lxp_46 = lxp_30 + 1; <bb 20>: Peeled iteration, bad: <bb 3>: # lxp_1 = PHI <lxp_46(20)> t_9 = pol_x[lxp_1]; _10 = (long int) lxp_1; _11 = _10 * 4; l_12 = _11 + -1; goto <bb 5>; <bb 4>: _13 = S_3 + l_12; __builtin_unreachable (); _14 = coef_x[_13]; _15 = S_3 + -1; _16 = s[_15]; _17 = _16 * t_9; _18 = _14 + _17; __builtin_unreachable (); coef_x[_13] = _18; S_20 = S_3 + 1; <bb 5>: # S_3 = PHI <1(3), S_20(4)> if (S_3 <= 4) goto <bb 4>; else goto <bb 6>; <bb 6>: lxp_21 = lxp_1 + 1; if (1 == 0) goto <bb 15>; else goto <bb 7>; <bb 15>: <bb 22>: __builtin_unreachable (); <bb 7>: __asm__ __volatile__("" : : "r" &coef_x : "memory"); goto <bb 13>; Note that the outer loop body looks ok - it is the inner body that get's mangled in a bogus way. That's because the loop bounds derived from the inner loop accesses are bogus: remove_exits_and_undefined_stmts (loop=0x7ffff67ec770, npeeled=1) at /space/rguenther/src/svn/trunk/gcc/tree-ssa-loop-ivcanon.c:481 481 bool changed = false; (gdb) n 483 for (elt = loop->bounds; elt; elt = elt->next) (gdb) 488 if (!elt->is_exit (gdb) p elt $8 = (nb_iter_bound *) 0x7ffff6925e38 (gdb) p *elt $9 = {stmt = 0x7ffff6905c80, bound = {low = 0, high = 0}, is_exit = false, next = 0x7ffff6925dc0} (gdb) p elt->stmt $10 = (gimple) 0x7ffff6905c80 (gdb) call debug_gimple_stmt ($10) # .MEM_19 = VDEF <.MEM_6> coef_x[_13] = _18; as far as I can see even with lxp == 1 we have at most an index of 4 + 1*4 - 1. Statement _14 = coef_x[_13]; is executed at most 0 (bounded by 0) + 1 times in loop 1. _that's_ bogus.
I'll have a more detailled look.
Ok, I'm confused by the following: <bb 3>: (loop 1 header) # lxp_1 = PHI <0(2), lxp_24(12)> t_9 = pol_x[lxp_1]; _10 = (long int) lxp_1; _11 = _10 * 4; l_12 = _11 + -1; goto <bb 5>; <bb 4>: _15 = S_2 + l_12; _16 = coef_x[_15]; _17 = S_2 + -1; _18 = s[_17]; _19 = _18 * t_9; _20 = _16 + _19; coef_x[_15] = _20; S_22 = S_2 + 1; <bb 5>: (loop 2 header) # S_2 = PHI <1(3), S_22(4)> if (S_2 <= 4) goto <bb 4>; else goto <bb 6>; <bb 6>: lxp_24 = lxp_1 + 1; if (lxp_1 != 1) goto <bb 12>; else goto <bb 7>; <bb 12>: goto <bb 3>; and SCEV says _15 is {l_12 + 1, +, 1}_2 which looks correct. But then it does (chrec_apply (varying_loop = 2) (chrec = {l_12 + 1, +, 1}_2) (x = 4) (res = l_12 + 5)) and magically, via l_12 being {-1, +, 4}_1 (also correct) arrives at (instantiate_scev (instantiate_below = 2) (evolution_loop = 1) (chrec = {4, +, 4}_1) (res = {4, +, 4}_1)) huh? So to it _15 is {4, +, 4}_1 (not sure what is considered "initial" in terms of scalar evolution with a value varying in an inner loop). This is what infer_loop_bounds_from_undefined derives the bogus bound for loop 1 from. To my eyes _15 should be {0, + 4}_1! Maybe it doesn't really make sense to ask for the evolution of something defined in loop N with respect to an outer loop M? If we change idx_infer_loop_bounds with Index: tree-ssa-loop-niter.c =================================================================== --- tree-ssa-loop-niter.c (revision 194552) +++ tree-ssa-loop-niter.c (working copy) @@ -2671,7 +2671,12 @@ idx_infer_loop_bounds (tree base, tree * upper = false; } - ev = instantiate_parameters (loop, analyze_scalar_evolution (loop, *idx)); + struct loop *dloop = loop_containing_stmt (data->stmt); + if (!dloop) + return true; + + ev = analyze_scalar_evolution (dloop, *idx); + ev = instantiate_parameters (loop, ev); init = initial_condition (ev); step = evolution_part_in_loop_num (ev, loop->num); then we obtain via {l_12 + 1, +, 1}_2, {{0, +, 4}_1, +, 1}_2 the correct solution (init == 0, step == 4). I am going to bootstrap and regtest that.
Author: rguenth Date: Tue Dec 18 13:12:34 2012 New Revision: 194578 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=194578 Log: 2012-12-18 Richard Biener <rguenther@suse.de> PR tree-optimization/55555 * tree-ssa-loop-niter.c (idx_infer_loop_bounds): Properly analyze evolution of the index for the loop it is used in. * tree-scalar-evolution.c (instantiate_scev_name): Take inner loop we will be creating a chrec for. Generalize fix for PR40281 and prune invalid SCEVs. (instantiate_scev_poly): Likewise - pass down inner loop we will be creating a chrec for. (instantiate_scev_binary): Take and pass through inner loop. (instantiate_array_ref): Likewise. (instantiate_scev_convert): Likewise. (instantiate_scev_not): Likewise. (instantiate_scev_3): Likewise. (instantiate_scev_2): Likewise. (instantiate_scev_1): Likewise. (instantiate_scev_r): Likewise. (resolve_mixers): Adjust. (instantiate_scev): Likewise. * gcc.dg/torture/pr55555.c: New testcase. * gcc.dg/vect/vect-iv-11.c: Adjust. Added: trunk/gcc/testsuite/gcc.dg/torture/pr55555.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/vect/vect-iv-11.c trunk/gcc/tree-scalar-evolution.c trunk/gcc/tree-ssa-loop-niter.c
Fixed.