This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/29581] New: Latent bug in 4.1/4.2/4.3 lambda-code.c


SUBROUTINE FOO (K)
      INTEGER I, J, K, A(5,5), B
      COMMON A
      A(1,1) = 1
 10   B = 0
      DO 30 I = 1, K
        DO 20 J = 1, K
          B = B + A(I,J)
 20     CONTINUE
        A(I,I) = A(I,I) * 2
 30   CONTINUE
      IF (B.GE.3) RETURN
      GO TO 10
      END SUBROUTINE

      PROGRAM BAR
        CALL FOO (2)
      END

is miscompiled on redhat/gcc-4_1-branch, with -O2 -ftree-loop-linear,
I believe on all arches, tested at least ppc -m32 and x86_64 -m64.
Unfortunately, I couldn't reproduce this on gcc-4_1-branch nor on the
trunk, because the statements coming from earlier passes
are slightly different, though IMHO valid in all 3 cases while what
-ftree-loop-linear creates is clearly invalid.  I believe it is a latent
bug in lambda-code.c (see below), although I don't have any testcases
that would prove it on non-vendor branches.

All vars used in foo are int4, the body in *.empty is on
redhat/gcc-4_1-branch:

<bb 0>:
  __BLNK__.a[0] = 1;
  D.684_14 = *k_13;
  goto <bb 2> (<L24>);
__label_000010:;
<L24>:;
  if (D.684_14 > 0) goto <L28>; else goto __label_000010;
<L28>:;
  # b_5 = PHI <0(3), b_37(8)>;
  # i_1 = PHI <1(3), i_28(8)>;
<L25>:;
  pretmp.29_7 = i_1 + -6;
  # j_9 = PHI <1(4), j_35(6)>;
  # b_6 = PHI <b_5(4), b_33(6)>;
<L3>:;
  D.697_29 = j_9 * 5;
  D.698_30 = pretmp.29_7;
  D.699_31 = pretmp.29_7 + D.697_29;
  D.700_32 = __BLNK__.a[D.699_31];
  b_33 = D.700_32 + b_6;
  j_35 = j_9 + 1;
  if (j_9 == D.684_14) goto <L8>; else goto <L30>;
<L30>:;
  goto <bb 5> (<L3>);
  # b_37 = PHI <b_33(5)>;
<L8>:;
  D.701_18 = i_1 * 5;
  D.702_19 = pretmp.29_7;
  D.703_20 = pretmp.29_7 + D.701_18;
  D.704_24 = __BLNK__.a[D.703_20];
  D.705_25 = D.704_24 * 2;
  __BLNK__.a[D.703_20] = D.705_25;
  if (i_1 == D.684_14) goto <L13>; else goto <L31>;
<L31>:;
  i_28 = i_1 + 1;
  goto <bb 4> (<L25>);
  # b_8 = PHI <b_37(7)>;
<L13>:;
  if (b_8 > 2) goto __return_foo; else goto __label_000010;
__return_foo:;
  return;

On gcc-4_1-branch the difference is just that pretmp.29_7 doesn't
exist, all users of that have i_1 + -6 instead.  On the trunk,
pretmp var is only used in the innermost loop (L3..L30) and after
L8 label it instead uses i_1 * 6 + -6.  Now, -ftree-loop-linear
on the above determines the L3..L30 loop isn't perfectly
nested in L25..L31 but can_convert_to_perfect_nest says it can
be converted to perfect nest, because the pretmp.9_7 = i_1 + -6;
satisfies:
                  /* If the IV is simple, it can be duplicated.  */
                  if (!automatically_generated_chrec_p (scev))
                    {
                      tree step = evolution_part_in_loop_num (scev, loop->num);
                      if (step && step != chrec_dont_know
                          && TREE_CODE (step) == INTEGER_CST)
                        continue;
                    }

-ftree-loop-linear then changes the loops, so that the b computation
is done using 2 nested loops and the *2 multiplication in a separate
loop.  Unfortunately, it doesn't compute the start of the loop
correctly, above, the first __BLKN__.a[] that is multiplied by 2 is
pretmp.29_7 + i_1 * 5 = i_1 * 6 - 6 where i_1 starts at 1,
but after the transformation it is perfectiv.32_12 * 5 + perfectiv.32_12:

...
  # perfectiv.32_12 = PHI <1(11), perfectiv.32_39(14)>;
<L34>:;
<bb 13>:
  uboundvar.33_40 = D.684_14 + 1;
  perfectiv.32_39 = perfectiv.32_12 + 1;
  D.701_18 = perfectiv.32_12 * 5;
  D.702_19 = perfectiv.32_12;
  D.703_20 = perfectiv.32_12 + D.701_18;
  D.704_24 = __BLNK__.a[D.703_20];
  D.705_25 = D.704_24 * 2;
  __BLNK__.a[D.703_20] = D.705_25;
  if (uboundvar.33_40 >= perfectiv.32_39) goto <L35>; else goto <L13>;
<L35>:;
  goto <bb 12> (<L34>);

can_convert_to_perfect_nest only looks at evolution part of the scev:

              if (TREE_CODE (stmt) == MODIFY_EXPR)
                {
                  use_operand_p use_a, use_b;
                  imm_use_iterator imm_iter;
                  ssa_op_iter op_iter, op_iter1;
                  tree op0 = TREE_OPERAND (stmt, 0);
                  tree scev = instantiate_parameters
                    (loop, analyze_scalar_evolution (loop, op0));

                  /* If the IV is simple, it can be duplicated.  */
                  if (!automatically_generated_chrec_p (scev))
                    {
                      tree step = evolution_part_in_loop_num (scev, loop->num);
                      if (step && step != chrec_dont_know
                          && TREE_CODE (step) == INTEGER_CST)
                        continue;
                    }

and replace_uses_equiv_to_x_with_y happily replaces vars with the perfectiv
if the evolution part of the scev is constant and equal to the loop step:

static void
replace_uses_equiv_to_x_with_y (struct loop *loop, tree stmt, tree x,
                                int xstep, tree y)
{
  ssa_op_iter iter;
  use_operand_p use_p;

  FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
    {
      tree use = USE_FROM_PTR (use_p);
      tree step = NULL_TREE;
      tree scev = instantiate_parameters (loop,
                                          analyze_scalar_evolution (loop,
use));

      if (scev != NULL_TREE && scev != chrec_dont_know)
        step = evolution_part_in_loop_num (scev, loop->num);

      if ((step && step != chrec_dont_know
           && TREE_CODE (step) == INTEGER_CST
           && int_cst_value (step) == xstep)
          || USE_FROM_PTR (use_p) == x)
        SET_USE (use_p, y);
    }
}

For USE_FROM_PTR (use_p) == x that's fine and if
i1 = initial_condition_in_loop_num (scev, loop->num) is equal to
i2 = initial_condition_in_loop_num (instantiate_parameters (loop,
analyze_scalar_evolution (loop, x)))
value of y that's fine too.  But if it is not, I believe we need to
replace use with y + i1 - i2.
In the testcase I posted, USE's initial_condition_in_loop_num is
-5 and X's initial_condition_in_loop_num is 1.  And, adding -5 - 1
to perfectiv (aka Y in the above routine) is exactly what would
cure this testcase.

Do you agree with this?

Not sure how hard is that though, other alternative would be to make
can_convert_to_perfect_nest more conservative.


-- 
           Summary: Latent bug in 4.1/4.2/4.3 lambda-code.c
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jakub at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29581


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]