This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/29581] New: Latent bug in 4.1/4.2/4.3 lambda-code.c
- From: "jakub at gcc dot gnu dot org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 24 Oct 2006 15:21:38 -0000
- Subject: [Bug tree-optimization/29581] New: Latent bug in 4.1/4.2/4.3 lambda-code.c
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
SUBROUTINE FOO (K)
INTEGER I, J, K, A(5,5), B
COMMON A
A(1,1) = 1
10 B = 0
DO 30 I = 1, K
DO 20 J = 1, K
B = B + A(I,J)
20 CONTINUE
A(I,I) = A(I,I) * 2
30 CONTINUE
IF (B.GE.3) RETURN
GO TO 10
END SUBROUTINE
PROGRAM BAR
CALL FOO (2)
END
is miscompiled on redhat/gcc-4_1-branch, with -O2 -ftree-loop-linear,
I believe on all arches, tested at least ppc -m32 and x86_64 -m64.
Unfortunately, I couldn't reproduce this on gcc-4_1-branch nor on the
trunk, because the statements coming from earlier passes
are slightly different, though IMHO valid in all 3 cases while what
-ftree-loop-linear creates is clearly invalid. I believe it is a latent
bug in lambda-code.c (see below), although I don't have any testcases
that would prove it on non-vendor branches.
All vars used in foo are int4, the body in *.empty is on
redhat/gcc-4_1-branch:
<bb 0>:
__BLNK__.a[0] = 1;
D.684_14 = *k_13;
goto <bb 2> (<L24>);
__label_000010:;
<L24>:;
if (D.684_14 > 0) goto <L28>; else goto __label_000010;
<L28>:;
# b_5 = PHI <0(3), b_37(8)>;
# i_1 = PHI <1(3), i_28(8)>;
<L25>:;
pretmp.29_7 = i_1 + -6;
# j_9 = PHI <1(4), j_35(6)>;
# b_6 = PHI <b_5(4), b_33(6)>;
<L3>:;
D.697_29 = j_9 * 5;
D.698_30 = pretmp.29_7;
D.699_31 = pretmp.29_7 + D.697_29;
D.700_32 = __BLNK__.a[D.699_31];
b_33 = D.700_32 + b_6;
j_35 = j_9 + 1;
if (j_9 == D.684_14) goto <L8>; else goto <L30>;
<L30>:;
goto <bb 5> (<L3>);
# b_37 = PHI <b_33(5)>;
<L8>:;
D.701_18 = i_1 * 5;
D.702_19 = pretmp.29_7;
D.703_20 = pretmp.29_7 + D.701_18;
D.704_24 = __BLNK__.a[D.703_20];
D.705_25 = D.704_24 * 2;
__BLNK__.a[D.703_20] = D.705_25;
if (i_1 == D.684_14) goto <L13>; else goto <L31>;
<L31>:;
i_28 = i_1 + 1;
goto <bb 4> (<L25>);
# b_8 = PHI <b_37(7)>;
<L13>:;
if (b_8 > 2) goto __return_foo; else goto __label_000010;
__return_foo:;
return;
On gcc-4_1-branch the difference is just that pretmp.29_7 doesn't
exist, all users of that have i_1 + -6 instead. On the trunk,
pretmp var is only used in the innermost loop (L3..L30) and after
L8 label it instead uses i_1 * 6 + -6. Now, -ftree-loop-linear
on the above determines the L3..L30 loop isn't perfectly
nested in L25..L31 but can_convert_to_perfect_nest says it can
be converted to perfect nest, because the pretmp.9_7 = i_1 + -6;
satisfies:
/* If the IV is simple, it can be duplicated. */
if (!automatically_generated_chrec_p (scev))
{
tree step = evolution_part_in_loop_num (scev, loop->num);
if (step && step != chrec_dont_know
&& TREE_CODE (step) == INTEGER_CST)
continue;
}
-ftree-loop-linear then changes the loops, so that the b computation
is done using 2 nested loops and the *2 multiplication in a separate
loop. Unfortunately, it doesn't compute the start of the loop
correctly, above, the first __BLKN__.a[] that is multiplied by 2 is
pretmp.29_7 + i_1 * 5 = i_1 * 6 - 6 where i_1 starts at 1,
but after the transformation it is perfectiv.32_12 * 5 + perfectiv.32_12:
...
# perfectiv.32_12 = PHI <1(11), perfectiv.32_39(14)>;
<L34>:;
<bb 13>:
uboundvar.33_40 = D.684_14 + 1;
perfectiv.32_39 = perfectiv.32_12 + 1;
D.701_18 = perfectiv.32_12 * 5;
D.702_19 = perfectiv.32_12;
D.703_20 = perfectiv.32_12 + D.701_18;
D.704_24 = __BLNK__.a[D.703_20];
D.705_25 = D.704_24 * 2;
__BLNK__.a[D.703_20] = D.705_25;
if (uboundvar.33_40 >= perfectiv.32_39) goto <L35>; else goto <L13>;
<L35>:;
goto <bb 12> (<L34>);
can_convert_to_perfect_nest only looks at evolution part of the scev:
if (TREE_CODE (stmt) == MODIFY_EXPR)
{
use_operand_p use_a, use_b;
imm_use_iterator imm_iter;
ssa_op_iter op_iter, op_iter1;
tree op0 = TREE_OPERAND (stmt, 0);
tree scev = instantiate_parameters
(loop, analyze_scalar_evolution (loop, op0));
/* If the IV is simple, it can be duplicated. */
if (!automatically_generated_chrec_p (scev))
{
tree step = evolution_part_in_loop_num (scev, loop->num);
if (step && step != chrec_dont_know
&& TREE_CODE (step) == INTEGER_CST)
continue;
}
and replace_uses_equiv_to_x_with_y happily replaces vars with the perfectiv
if the evolution part of the scev is constant and equal to the loop step:
static void
replace_uses_equiv_to_x_with_y (struct loop *loop, tree stmt, tree x,
int xstep, tree y)
{
ssa_op_iter iter;
use_operand_p use_p;
FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
{
tree use = USE_FROM_PTR (use_p);
tree step = NULL_TREE;
tree scev = instantiate_parameters (loop,
analyze_scalar_evolution (loop,
use));
if (scev != NULL_TREE && scev != chrec_dont_know)
step = evolution_part_in_loop_num (scev, loop->num);
if ((step && step != chrec_dont_know
&& TREE_CODE (step) == INTEGER_CST
&& int_cst_value (step) == xstep)
|| USE_FROM_PTR (use_p) == x)
SET_USE (use_p, y);
}
}
For USE_FROM_PTR (use_p) == x that's fine and if
i1 = initial_condition_in_loop_num (scev, loop->num) is equal to
i2 = initial_condition_in_loop_num (instantiate_parameters (loop,
analyze_scalar_evolution (loop, x)))
value of y that's fine too. But if it is not, I believe we need to
replace use with y + i1 - i2.
In the testcase I posted, USE's initial_condition_in_loop_num is
-5 and X's initial_condition_in_loop_num is 1. And, adding -5 - 1
to perfectiv (aka Y in the above routine) is exactly what would
cure this testcase.
Do you agree with this?
Not sure how hard is that though, other alternative would be to make
can_convert_to_perfect_nest more conservative.
--
Summary: Latent bug in 4.1/4.2/4.3 lambda-code.c
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: jakub at gcc dot gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29581