Bug 29832 - -ftree-loop-linear miscompiles scalarize.f90
-ftree-loop-linear miscompiles scalarize.f90
Status: RESOLVED FIXED
Product: gcc
Classification: Unclassified
Component: tree-optimization
4.3.0
: P3 normal
: ---
Assigned To: Sebastian Pop
: wrong-code
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-11-14 14:11 UTC by Jakub Jelinek
Modified: 2011-02-02 17:48 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 4.1.2, 4.2.0, 4.3.0
Last reconfirmed: 2008-02-04 21:59:37


Attachments
reduced testcase, in C (410 bytes, text/plain)
2010-09-09 23:44 UTC, Zdenek Sojka
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2006-11-14 14:11:55 UTC
gfortran.fortran-torture/execute/scalarize.f90 is miscompiled at least
on x86_64-linux with -O2 -ftree-loop-linear (other miscompiled fortran
tests with -ftree-loop-linear are forall_1.f90 and der_type.f90).

The only linear transformed loop in scalarize.f90 is the
b(:, 5:1:-1) = a
one (i.e.
int8 S.2;

S.2 = 0;
while (1)
  {
    if (S.2 > 4) goto L.6; else (void) 0;
    {
      int8 D.1343;
      int8 D.1342;
      int8 S.3;

      D.1342 = (NON_LVALUE_EXPR <S.2> + 1) * 6 + -7;
      D.1343 = (5 - S.2) * 6 + -7;
      S.3 = 1;
      while (1)
        {
          if (S.3 > 6) goto L.5; else (void) 0;
          b[NON_LVALUE_EXPR <S.3> + D.1343] = a[NON_LVALUE_EXPR <S.3> + D.1342];
          S.3 = S.3 + 1;
        }
      L.5:;
    }
    S.2 = S.2 + 1;
  }
L.6:;
).  When entering ltrans, this is not perfect nest, but perfect_nestify
changes it.  I believe that transformation is valid:

We have before ltrans (with loop entry L13 and loop exit
L85):
<L12>:;
  if (S.2_40 > 4) goto <L85>; else goto <L86>;
<L86>:;
  # S.2_165 = PHI <S.2_40(13), 0(11)>;
<L13>:;
  S.2_40 = S.2_165 + 1;
  D.1393_41 = S.2_40 * 6;
  D.1394_43 = 5 - S.2_165;
  D.1395_44 = D.1394_43 * 6;
  pretmp.59_117 = D.1395_44 + -7;
  pretmp.59_121 = D.1393_41 + -7;
  # S.3_171 = PHI <S.3_51(16), 1(14)>;
<L15>:;
  D.1343_45 = pretmp.59_117;
  D.1397_47 = pretmp.59_117 + S.3_171;
  D.1342_42 = pretmp.59_121;
  D.1398_48 = pretmp.59_121 + S.3_171;
  D.1399_49 = a[D.1398_48];
  b[D.1397_47] = D.1399_49;
  S.3_51 = S.3_171 + 1;
  if (S.3_51 > 6) goto <L12>; else goto <L87>;
<L87>:;
  goto <bb 15> (<L15>);
This isn't perfect nest due to the pretmp.59_* MODIFY_EXPRs
and their temporaries, and perfect_nestify transforms this into:
<L12>:;
  if (S.2_40 > 4) goto <L99>; else goto <L86>;
<L86>:;
  # S.2_165 = PHI <S.2_40(13), 0(11)>;
<L13>:;
  S.2_40 = S.2_165 + 1;
  # S.3_171 = PHI <S.3_51(16), 1(14)>;
<L15>:;
  D.1393_41 = S.2_40 * 6;
  D.1394_43 = 5 - S.2_165;
  D.1395_44 = D.1394_43 * 6;
  pretmp.59_117 = D.1395_44 + -7;
  pretmp.59_121 = D.1393_41 + -7;
  D.1343_45 = pretmp.59_117;
  D.1397_47 = pretmp.59_117 + S.3_171;
  D.1342_42 = pretmp.59_121;
  D.1398_48 = pretmp.59_121 + S.3_171;
  D.1399_49 = a[D.1398_48];
  b[D.1397_47] = D.1399_49;
  S.3_51 = S.3_171 + 1;
  if (S.3_51 > 6) goto <L12>; else goto <L87>;
<L87>:;
  goto <bb 15> (<L15>);
...
<L99>:;
  goto <bb 44> (<L100>);
  # perfectiv.73_65 = PHI <0(43), perfectiv.73_4(46)>;
<L100>:;
  uboundvar.74_155 = 4 + -1;
  perfectiv.73_4 = perfectiv.73_65 + 1;
  if (uboundvar.74_155 >= perfectiv.73_4) goto <L101>; else goto <L85>;
<L101>:;
  goto <bb 44> (<L100>);

(this was dumped before perfect_nest call at the end of perfect_nestify).
It looks just fine to me, all it did was move the pretmp.95_* and their
temporaries setting back into the L15 loop and create a pointless empty
loop which further optimizations surely remove.

The problem is either in perfect_nest_p (i.e. question whether what perfect_nestify created is a valid perfect nest, but identical basic blocks
could as well come just from earlier passes without perfect_nestify being ever called), or if lambda_loopnest_to_gcc_loopnest screw this up.

The problematic statement is the:
S.2_40 = S.2_165 + 1;
which is stmt_is_bumper_for_loop for the outer loop, but isn't used solely
as the bumper, but also in the inner loop:
D.1393_41 = S.2_40 * 6;
Comment 1 Jakub Jelinek 2006-11-14 14:27:31 UTC
lambda_loopnest_to_gcc_loopnest interchanges the loops and we get:
<L12>:;
  lletmp.77_46 = 0;
  lletmp.77_38 = lletmp.77_46 + 5;
  lnivtmp.75_21 = lnivtmp.75_9 + 1;
  lnivtmp.75_12 = lnivtmp.75_9 + 1;
  if (lletmp.77_38 <= lnivtmp.75_12) goto <L99>; else goto <L86>;
<L86>:;
  # lnivtmp.75_9 = PHI <lnivtmp.75_21(13), lletmp.76_162(11)>;
  # S.2_165 = PHI <S.2_40(13), 0(11)>;
<L13>:;
  lbvtmp.82_18 = 0;
  lbvtmp.82_145 = lnivtmp.78_8;
   lbvtmp.82_20 = lbvtmp.82_18 + lbvtmp.82_145;
  S.2_40 = lbvtmp.82_20 + 1;
  lletmp.79_154 = 0;
  # lnivtmp.78_8 = PHI <lnivtmp.78_3(16), lletmp.79_154(14)>;
  # S.3_171 = PHI <S.3_51(16), 1(14)>;
<L15>:;
  lletmp.80_22 = 0;
  lletmp.80_56 = lletmp.80_22 + 3;
  D.1393_41 = S.2_40 * 6;
  lbvtmp.81_31 = 0;
  lbvtmp.81_26 = lnivtmp.78_8;
  lbvtmp.81_106 = lbvtmp.81_26 + lbvtmp.81_31;
  D.1394_43 = 5 - lbvtmp.81_106;
  D.1395_44 = D.1394_43 * 6;
  pretmp.59_117 = D.1395_44 + -7;
  pretmp.59_121 = D.1393_41 + -7;
  D.1343_45 = pretmp.59_117;
  lbvtmp.84_161 = 0;
  lbvtmp.84_105 = lnivtmp.75_9;
  lbvtmp.84_120 = lbvtmp.84_105 + lbvtmp.84_161;
  D.1397_47 = pretmp.59_117 + lbvtmp.84_120;
  D.1342_42 = pretmp.59_121;
  lbvtmp.83_157 = 0;
  lbvtmp.83_158 = lnivtmp.75_9;
  lbvtmp.83_159 = lbvtmp.83_157 + lbvtmp.83_158;
  D.1398_48 = pretmp.59_121 + lbvtmp.83_159;
  D.1399_49 = a[D.1398_48];
  b[D.1397_47] = D.1399_49;
  lbvtmp.85_125 = 0;
  lbvtmp.85_74 = lnivtmp.75_9;
  lbvtmp.85_68 = lbvtmp.85_74 + lbvtmp.85_125;
  S.3_51 = lbvtmp.85_68 + 1;
  lnivtmp.78_3 = lnivtmp.78_8 + 1;
  lnivtmp.78_19 = lnivtmp.78_8 + 1;
  if (lletmp.80_56 <= lnivtmp.78_19) goto <L12>; else goto <L87>;
<L87>:;
  goto <bb 15> (<L15>);

These basic blocks are the only ones that ever mention lnivtmp.78
variable, it is set from a PHI node at L15, but used both in L15
(ok) and also in L13, which isn't dominated by L15 (L13 is where
the loop is entered).

The lnivtmp.78 use before definition is from the bumper statement which is considered ok by perfect_nest_p.  If the bumper SSA_NAME was only used in
the bumper statement and the PHI node, this wouldn't be a big issue, eventhough
lambda_loopnest_to_gcc_loopnest would reference undefined variable, it would
be quickly optimized away (as the bumper would be optimized out as unused)
- ltrans creates new bumper vars, but unfortunately it is used.

I think either perfect_nest_p should only allow bumper variables which are
solely used in the bumper statement and PHI node and nowhere else (this would
cure this case, either it wouldn't be considered valid for ltrans or perfect_nestify would need to try harder (e.g. by cloning the bumper statement within the inner loop's body, in the inner loop's body it would set a different
temporary which would be used there and the original bumper var would be only
used outside of inner loop and in the PHI)), or lambda_loopnest_to_gcc_loopnest
needs to be tought to special case stmt_is_bumper_for_loop statements outside
of the innermost loop.
Comment 2 Jakub Jelinek 2006-11-14 14:28:12 UTC
Forgot to mention, the problem is reproduceable also on gcc-4_1-branch
and gcc-4_2-branch.
Comment 3 Serge Belyshev 2007-09-14 00:10:32 UTC
Also miscompiled by 4.2 and above: scalarize2.f90
Miscompiled by current trunk: alloc_comp_assign_2.f90 pr19928-2.f90 where_1.f90 where_6.f90
Comment 4 Richard Biener 2008-02-04 16:01:27 UTC
Re-confirmed on the trunk.

make check-gfortran \
  RUNTESTFLAGS="--target_board=unix/-ftree-loop-linear execute.exp"
Comment 5 Sebastian Pop 2008-02-04 21:59:37 UTC
Mine.
Comment 6 Joseph S. Myers 2008-07-04 19:35:00 UTC
No working version given in this bug, removing milestone.
Comment 7 Zdenek Sojka 2010-09-09 23:44:29 UTC
Created attachment 21758 [details]
reduced testcase, in C

This testcase is C-ified scalarize.f90 and then reduced.
It fails with -Os -ftree-loop-linear (r164128, x86_64-linux):
(even the original testcase now fails only with -Os)

$ gcc pr29832.c -Os -ftree-loop-linear -Wall
pr29832.c: In function 'testarray':
pr29832.c:2:1: warning: 'lnivtmp.8' is used uninitialized in this function [-Wuninitialized]
$ ./a.out 
Aborted

The warning is about internal variable, not used in the source.
Comment 8 Sebastian Pop 2011-01-18 20:54:30 UTC
Author: spop
Date: Tue Jan 18 20:54:26 2011
New Revision: 168963

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=168963
Log:
Add testcase for PR29832.

2011-01-18  Sebastian Pop  <sebastian.pop@amd.com>

	PR tree-optimization/29832
	* gfortran.dg/graphite/pr29832.f90: New.

Added:
    branches/graphite/gcc/testsuite/gfortran.dg/graphite/pr29832.f90
Modified:
    branches/graphite/gcc/ChangeLog.graphite
Comment 9 Sebastian Pop 2011-01-18 20:57:25 UTC
Fixed on the graphite branch.
Comment 10 Sebastian Pop 2011-01-25 21:24:51 UTC
Author: spop
Date: Tue Jan 25 21:24:44 2011
New Revision: 169253

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169253
Log:
Add testcase for PR29832.

2011-01-25  Sebastian Pop  <sebastian.pop@amd.com>

	PR tree-optimization/29832
	* gfortran.dg/graphite/pr29832.f90: New.

Added:
    trunk/gcc/testsuite/gfortran.dg/graphite/pr29832.f90
Modified:
    trunk/gcc/testsuite/ChangeLog
Comment 11 Sebastian Pop 2011-01-25 21:28:32 UTC
Fixed.
Comment 12 Diego Novillo 2011-02-02 17:48:11 UTC
Author: dnovillo
Date: Wed Feb  2 17:48:08 2011
New Revision: 169592

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169592
Log:
Add testcase for PR29832.

2011-01-25  Sebastian Pop  <sebastian.pop@amd.com>

	PR tree-optimization/29832
	* gfortran.dg/graphite/pr29832.f90: New.

Added:
    branches/google/integration/gcc/testsuite/gfortran.dg/graphite/pr29832.f90
Modified:
    branches/google/integration/gcc/testsuite/ChangeLog