SUBROUTINE FOO (K) INTEGER I, J, K, A(5,5), B COMMON A A(1,1) = 1 10 B = 0 DO 30 I = 1, K DO 20 J = 1, K B = B + A(I,J) 20 CONTINUE A(I,I) = A(I,I) * 2 30 CONTINUE IF (B.GE.3) RETURN GO TO 10 END SUBROUTINE PROGRAM BAR CALL FOO (2) END is miscompiled on redhat/gcc-4_1-branch, with -O2 -ftree-loop-linear, I believe on all arches, tested at least ppc -m32 and x86_64 -m64. Unfortunately, I couldn't reproduce this on gcc-4_1-branch nor on the trunk, because the statements coming from earlier passes are slightly different, though IMHO valid in all 3 cases while what -ftree-loop-linear creates is clearly invalid. I believe it is a latent bug in lambda-code.c (see below), although I don't have any testcases that would prove it on non-vendor branches. All vars used in foo are int4, the body in *.empty is on redhat/gcc-4_1-branch: <bb 0>: __BLNK__.a[0] = 1; D.684_14 = *k_13; goto <bb 2> (<L24>); __label_000010:; <L24>:; if (D.684_14 > 0) goto <L28>; else goto __label_000010; <L28>:; # b_5 = PHI <0(3), b_37(8)>; # i_1 = PHI <1(3), i_28(8)>; <L25>:; pretmp.29_7 = i_1 + -6; # j_9 = PHI <1(4), j_35(6)>; # b_6 = PHI <b_5(4), b_33(6)>; <L3>:; D.697_29 = j_9 * 5; D.698_30 = pretmp.29_7; D.699_31 = pretmp.29_7 + D.697_29; D.700_32 = __BLNK__.a[D.699_31]; b_33 = D.700_32 + b_6; j_35 = j_9 + 1; if (j_9 == D.684_14) goto <L8>; else goto <L30>; <L30>:; goto <bb 5> (<L3>); # b_37 = PHI <b_33(5)>; <L8>:; D.701_18 = i_1 * 5; D.702_19 = pretmp.29_7; D.703_20 = pretmp.29_7 + D.701_18; D.704_24 = __BLNK__.a[D.703_20]; D.705_25 = D.704_24 * 2; __BLNK__.a[D.703_20] = D.705_25; if (i_1 == D.684_14) goto <L13>; else goto <L31>; <L31>:; i_28 = i_1 + 1; goto <bb 4> (<L25>); # b_8 = PHI <b_37(7)>; <L13>:; if (b_8 > 2) goto __return_foo; else goto __label_000010; __return_foo:; return; On gcc-4_1-branch the difference is just that pretmp.29_7 doesn't exist, all users of that have i_1 + -6 instead. On the trunk, pretmp var is only used in the innermost loop (L3..L30) and after L8 label it instead uses i_1 * 6 + -6. Now, -ftree-loop-linear on the above determines the L3..L30 loop isn't perfectly nested in L25..L31 but can_convert_to_perfect_nest says it can be converted to perfect nest, because the pretmp.9_7 = i_1 + -6; satisfies: /* If the IV is simple, it can be duplicated. */ if (!automatically_generated_chrec_p (scev)) { tree step = evolution_part_in_loop_num (scev, loop->num); if (step && step != chrec_dont_know && TREE_CODE (step) == INTEGER_CST) continue; } -ftree-loop-linear then changes the loops, so that the b computation is done using 2 nested loops and the *2 multiplication in a separate loop. Unfortunately, it doesn't compute the start of the loop correctly, above, the first __BLKN__.a[] that is multiplied by 2 is pretmp.29_7 + i_1 * 5 = i_1 * 6 - 6 where i_1 starts at 1, but after the transformation it is perfectiv.32_12 * 5 + perfectiv.32_12: ... # perfectiv.32_12 = PHI <1(11), perfectiv.32_39(14)>; <L34>:; <bb 13>: uboundvar.33_40 = D.684_14 + 1; perfectiv.32_39 = perfectiv.32_12 + 1; D.701_18 = perfectiv.32_12 * 5; D.702_19 = perfectiv.32_12; D.703_20 = perfectiv.32_12 + D.701_18; D.704_24 = __BLNK__.a[D.703_20]; D.705_25 = D.704_24 * 2; __BLNK__.a[D.703_20] = D.705_25; if (uboundvar.33_40 >= perfectiv.32_39) goto <L35>; else goto <L13>; <L35>:; goto <bb 12> (<L34>); can_convert_to_perfect_nest only looks at evolution part of the scev: if (TREE_CODE (stmt) == MODIFY_EXPR) { use_operand_p use_a, use_b; imm_use_iterator imm_iter; ssa_op_iter op_iter, op_iter1; tree op0 = TREE_OPERAND (stmt, 0); tree scev = instantiate_parameters (loop, analyze_scalar_evolution (loop, op0)); /* If the IV is simple, it can be duplicated. */ if (!automatically_generated_chrec_p (scev)) { tree step = evolution_part_in_loop_num (scev, loop->num); if (step && step != chrec_dont_know && TREE_CODE (step) == INTEGER_CST) continue; } and replace_uses_equiv_to_x_with_y happily replaces vars with the perfectiv if the evolution part of the scev is constant and equal to the loop step: static void replace_uses_equiv_to_x_with_y (struct loop *loop, tree stmt, tree x, int xstep, tree y) { ssa_op_iter iter; use_operand_p use_p; FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE) { tree use = USE_FROM_PTR (use_p); tree step = NULL_TREE; tree scev = instantiate_parameters (loop, analyze_scalar_evolution (loop, use)); if (scev != NULL_TREE && scev != chrec_dont_know) step = evolution_part_in_loop_num (scev, loop->num); if ((step && step != chrec_dont_know && TREE_CODE (step) == INTEGER_CST && int_cst_value (step) == xstep) || USE_FROM_PTR (use_p) == x) SET_USE (use_p, y); } } For USE_FROM_PTR (use_p) == x that's fine and if i1 = initial_condition_in_loop_num (scev, loop->num) is equal to i2 = initial_condition_in_loop_num (instantiate_parameters (loop, analyze_scalar_evolution (loop, x))) value of y that's fine too. But if it is not, I believe we need to replace use with y + i1 - i2. In the testcase I posted, USE's initial_condition_in_loop_num is -5 and X's initial_condition_in_loop_num is 1. And, adding -5 - 1 to perfectiv (aka Y in the above routine) is exactly what would cure this testcase. Do you agree with this? Not sure how hard is that though, other alternative would be to make can_convert_to_perfect_nest more conservative.
(In reply to comment #0) > In the testcase I posted, USE's initial_condition_in_loop_num is > -5 and X's initial_condition_in_loop_num is 1. And, adding -5 - 1 > to perfectiv (aka Y in the above routine) is exactly what would > cure this testcase. > > Do you agree with this? > I agree completely with your analysis. If you want to work up a fix to do that, i'm happy to approve it (It should suffice to generate the initial condition in the loop header and use that as the starting point for the new iv variable). > Not sure how hard is that though, other alternative would be to make > can_convert_to_perfect_nest more conservative. >
Created attachment 12601 [details] gcc43-pr29581-tests.patch Testcases that show this bug even on the trunk (taken from vect-8{5,6,7,8}.c, removed vector related stuff from it, added -O2 -ftree-loop-linear). With my preliminary fix for this PR all these seem to be fixed (on a release only checking gcc), unfortunately with checking enabled I'm getting tree checking failures like: error: statement makes a memory store, but has no V_MAY_DEFS nor V_MUST_DEFS perfecttmp.42 = perfectiv.40_5 + 16; I called update_stmt, o not sure what I'm still missing. BTW, with -ftree-loop-linear also fortran forall_1.f90 and scalarize.f90 are miscompiled, but that problem isn't fixed by my patch, so perhaps there are other bugs.
> error: statement makes a memory store, but has no V_MAY_DEFS nor V_MUST_DEFS > perfecttmp.42 = perfectiv.40_5 + 16; > I called update_stmt, o not sure what I'm still missing. You forgot to mark perfecttmp.42 for renaming (or manually name it yourself).
Created attachment 12607 [details] gcc43-pr29581.patch Yeah, found that out too earlier today. This is the fix which I have regtested, including RUNTESTFLAGS=--target_board=unix/-ftree-loop-linear.
Subject: Bug 29581 Author: jakub Date: Wed Nov 15 09:35:34 2006 New Revision: 118848 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=118848 Log: PR tree-optimization/29581 * lambda-code.c (replace_uses_equiv_to_x_with_y): Add YINIT, REPLACEMENTS, FIRSTBSI arguments. If initial condition or type is different between Y and USE, create a temporary variable, initialize it at the beginning of the body bb and use it as replacement instead of Y. * gcc.dg/pr29581-1.c: New test. * gcc.dg/pr29581-2.c: New test. * gcc.dg/pr29581-3.c: New test. * gcc.dg/pr29581-4.c: New test. * gfortran.dg/pr29581.f90: New test. Added: trunk/gcc/testsuite/gcc.dg/pr29581-1.c trunk/gcc/testsuite/gcc.dg/pr29581-2.c trunk/gcc/testsuite/gcc.dg/pr29581-3.c trunk/gcc/testsuite/gcc.dg/pr29581-4.c trunk/gcc/testsuite/gfortran.dg/pr29581.f90 Modified: trunk/gcc/ChangeLog trunk/gcc/lambda-code.c trunk/gcc/testsuite/ChangeLog
Subject: Bug 29581 Author: jakub Date: Wed Nov 15 09:36:51 2006 New Revision: 118849 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=118849 Log: PR tree-optimization/29581 * lambda-code.c (replace_uses_equiv_to_x_with_y): Add YINIT, REPLACEMENTS, FIRSTBSI arguments. If initial condition or type is different between Y and USE, create a temporary variable, initialize it at the beginning of the body bb and use it as replacement instead of Y. * gcc.dg/pr29581-1.c: New test. * gcc.dg/pr29581-2.c: New test. * gcc.dg/pr29581-3.c: New test. * gcc.dg/pr29581-4.c: New test. * gfortran.dg/pr29581.f90: New test. Added: branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/pr29581-1.c branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/pr29581-2.c branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/pr29581-3.c branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/pr29581-4.c branches/gcc-4_2-branch/gcc/testsuite/gfortran.dg/pr29581.f90 Modified: branches/gcc-4_2-branch/gcc/ChangeLog branches/gcc-4_2-branch/gcc/lambda-code.c branches/gcc-4_2-branch/gcc/testsuite/ChangeLog
Subject: Bug 29581 Author: jakub Date: Wed Nov 15 09:39:18 2006 New Revision: 118850 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=118850 Log: PR tree-optimization/29581 * lambda-code.c (replace_uses_equiv_to_x_with_y): Add YINIT, REPLACEMENTS, FIRSTBSI arguments. If initial condition or type is different between Y and USE, create a temporary variable, initialize it at the beginning of the body bb and use it as replacement instead of Y. * gcc.dg/pr29581-1.c: New test. * gcc.dg/pr29581-2.c: New test. * gcc.dg/pr29581-3.c: New test. * gcc.dg/pr29581-4.c: New test. * gfortran.dg/pr29581.f90: New test. Added: branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/pr29581-1.c branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/pr29581-2.c branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/pr29581-3.c branches/gcc-4_1-branch/gcc/testsuite/gcc.dg/pr29581-4.c branches/gcc-4_1-branch/gcc/testsuite/gfortran.dg/pr29581.f90 Modified: branches/gcc-4_1-branch/gcc/ChangeLog branches/gcc-4_1-branch/gcc/lambda-code.c branches/gcc-4_1-branch/gcc/testsuite/ChangeLog
Fixed.