Bug 90796 - [8 Regression] GCC: O2 vs O3 output differs on simple test
Summary: [8 Regression] GCC: O2 vs O3 output differs on simple test
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 10.0
: P2 normal
Target Milestone: 9.3
Assignee: Michael Matz
URL:
Keywords: wrong-code
: 91240 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-06-09 07:23 UTC by Tilir
Modified: 2021-05-14 13:08 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work: 7.4.0, 9.2.1
Known to fail: 8.3.0, 9.1.0, 9.2.0
Last reconfirmed: 2019-06-09 00:00:00


Attachments
potential patch (2.77 KB, text/plain)
2019-08-05 19:53 UTC, Michael Matz
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tilir 2019-06-09 07:23:14 UTC
Reproduction:

---
#include <stdio.h>
unsigned b[11];
unsigned c;
int d, e, f;
char en;
int main() {
  for (; e < 100; e += 3) {
    printf("b[0] = %u\n", b[0]);
    for (; c < 9; c++)
      for (d = 2; d < 11; d++) {
        f = b[c + 2] ^ 9;
        b[c] = f;
      }
  }
}
---

Compiler information:

---
> gcc -v
Using built-in specs.
COLLECT_GCC=/apps/bin/gcc
COLLECT_LTO_WRAPPER=/apps/9fafffe/bin/../libexec/gcc/x86_64-pc-linux-gnu/10.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /srcs/gcc/configure --disable-bootstrap --prefix=/apps/9fafffe
Thread model: posix
gcc version 10.0.0 20190608 (experimental) (GCC)
---

Compile and run like this:

> gcc -O2 min.c -o corr
> ./corr >& corr.log
> gcc -O3 min.c -o wrong
> ./wrong >& wrong.log

Correct output:

b[0] = 0
b[0] = 9
....
b[0] = 9

Wrong output:

b[0] = 0
b[0] = 0
....
b[0] = 0

O0 and O1 modes agree with O2.
Comment 1 Marc Glisse 2019-06-09 09:48:26 UTC
Confirmed.

I also notice a missed optimization:

  _16 = ivtmp.21_234 + 2;
  _2 = b[_16];
  _41 = _2 ^ 9;
  b[ivtmp.21_234] = _41;
  _52 = b[_16];
  _51 = _52 ^ 9;
  b[ivtmp.21_234] = _51;
  _65 = b[_16];
etc.

it seems clear from the first line that b[_16] and b[ivtmp.21_234] cannot alias, so this should simplify. And it isn't just because cunroll is late, we don't simplify it either if I write directly this pattern in C.
Comment 2 Marc Glisse 2019-06-09 09:53:39 UTC
-fdisable-tree-unrolljam helps, which may (or may not) point at a potential culprit.
Comment 3 Martin Liška 2019-06-10 08:05:22 UTC
Yes, it started with r255467.
There's a simplified test-case:

$ cat pr90796.c
unsigned b[11];
unsigned c;
int d, e, f;
char en;

int main() {
  char b[100];
  for (; e < 6; e += 3) {
    __builtin_sprintf(b, "%u", b[0]);
    for (; c < 9; c++)
      for (d = 2; d < 11; d++) {
        f = b[c + 2] ^ 9;
        b[c] = f;
      }
  }
  __builtin_printf("b:%s\n", b);
  if (__builtin_strcmp (b, "9") != 0)
    __builtin_abort ();
  return 0;
}

$ gcc pr90796.c -O3 && ./a.out
b:0
Aborted (core dumped)
Comment 4 Richard Biener 2019-06-11 08:05:04 UTC
Hmm, the CFG looks like unroll-and-jam attempts to do versioning/peeling
but forgets the tail loop is executed at least once?
Comment 5 Michael Matz 2019-06-11 13:46:28 UTC
FWIW, the reduced testcase from comment #3 is wrong.  Even with -O0 or with gcc 4.3 or 6 I get:

b:48
Aborted (core dumped)


But I can reproduce the problem with the original testcase.
Comment 6 Michael Matz 2019-06-12 15:39:40 UTC
I think I know what's going on.  Mine.
Comment 7 Michael Matz 2019-08-05 19:53:48 UTC
Created attachment 46675 [details]
potential patch

Actually I was barking up the wrong tree.  It's not as easy as the CFG manipulation for loop fusion going wrong (like missing some last iterations
or so).  It's really a problem in the dependence analysis.  See the extensive
comment in the patch.

The fun thing is, there's a difference between these two loop nests:

   for (i) for (j) a[i][0] = f(a[i+1][0]);
   for (i) for (j) b[i][j] = f(a[i+1][j]);

Even though the distance vector for the read/write in the single statement
is (-1,0) for both loops, unroll-and-jam is valid for the second but not
for the first.
Comment 8 rguenther@suse.de 2019-08-06 07:26:52 UTC
On August 5, 2019 9:53:48 PM GMT+02:00, "matz at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796
>
>--- Comment #7 from Michael Matz <matz at gcc dot gnu.org> ---
>Created attachment 46675 [details]
>  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46675&action=edit
>potential patch
>
>Actually I was barking up the wrong tree.  It's not as easy as the CFG
>manipulation for loop fusion going wrong (like missing some last
>iterations
>or so).  It's really a problem in the dependence analysis.  See the
>extensive
>comment in the patch.
>
>The fun thing is, there's a difference between these two loop nests:
>
>   for (i) for (j) a[i][0] = f(a[i+1][0]);
>   for (i) for (j) b[i][j] = f(a[i+1][j]);

What about

  B[i][j/2]...

? It's really surprising that only invariants are special here.

>Even though the distance vector for the read/write in the single
>statement
>is (-1,0) for both loops, unroll-and-jam is valid for the second but
>not
>for the first.
Comment 9 Michael Matz 2019-08-06 15:36:49 UTC
(In reply to rguenther@suse.de from comment #8)
> >The fun thing is, there's a difference between these two loop nests:
> >
> >   for (i) for (j) a[i][0] = f(a[i+1][0]);
> >   for (i) for (j) b[i][j] = f(a[i+1][j]);
> 
> What about
> 
>   B[i][j/2]...
> 
> ?

That would be a problem as well, but luckily that's not an affine function of j,
and hence has no analyzable access function, and so isn't fused for different
reasons.

> It's really surprising that only invariants are special here.

It's the only affine functions that don't progress with each iteration.  I
think, at least :)
Comment 10 rguenther@suse.de 2019-08-06 16:11:23 UTC
On August 6, 2019 5:36:49 PM GMT+02:00, "matz at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796
>
>--- Comment #9 from Michael Matz <matz at gcc dot gnu.org> ---
>(In reply to rguenther@suse.de from comment #8)
>> >The fun thing is, there's a difference between these two loop nests:
>> >
>> >   for (i) for (j) a[i][0] = f(a[i+1][0]);
>> >   for (i) for (j) b[i][j] = f(a[i+1][j]);
>> 
>> What about
>> 
>>   B[i][j/2]...
>> 
>> ?
>
>That would be a problem as well, but luckily that's not an affine
>function of
>j,
>and hence has no analyzable access function, and so isn't fused for
>different
>reasons.
>
>> It's really surprising that only invariants are special here.
>
>It's the only affine functions that don't progress with each iteration.
> I
>think, at least :)

Hm. At least we analyze wrapping ones, but I guess 0, 1, 0, 1 would be caught in another way..
Comment 11 Michael Matz 2019-08-07 12:06:47 UTC
(In reply to rguenther@suse.de from comment #10)
> >It's the only affine functions that don't progress with each iteration.
> > I
> >think, at least :)
> 
> Hm. At least we analyze wrapping ones, but I guess 0, 1, 0, 1 would be
> caught in another way..

Yes, we analyze them, but for nothing.  They aren't affine either, and hence
result in unknown dependences.
Comment 12 Michael Matz 2019-10-22 12:25:39 UTC
Author: matz
Date: Tue Oct 22 12:25:03 2019
New Revision: 277287

URL: https://gcc.gnu.org/viewcvs?rev=277287&root=gcc&view=rev
Log:
Fix PR middle-end/90796

        PR middle-end/90796
        * gimple-loop-jam.c (any_access_function_variant_p): New function.
        (adjust_unroll_factor): Use it to constrain safety, new parameter.
        (tree_loop_unroll_and_jam): Adjust call and profitable unroll factor.

testsuite/
        * gcc.dg/unroll-and-jam.c: Add three invalid and one valid case.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/gimple-loop-jam.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/unroll-and-jam.c
Comment 13 Michael Matz 2019-10-22 12:26:51 UTC
Fixed in trunk so far.  Will be backporting in a few days.
Comment 14 Jakub Jelinek 2019-11-20 11:23:32 UTC
Time to backport now?
Comment 15 Jakub Jelinek 2019-11-20 11:38:53 UTC
*** Bug 91240 has been marked as a duplicate of this bug. ***
Comment 16 Michael Matz 2019-11-20 11:40:34 UTC
(In reply to Jakub Jelinek from comment #14)
> Time to backport now?

Hmpf, I've actually done the regstrapping for gcc9 already but then forgot to submit.  Thanks for the reminder.
Comment 17 Michael Matz 2019-11-20 16:51:46 UTC
Author: matz
Date: Wed Nov 20 16:51:10 2019
New Revision: 278512

URL: https://gcc.gnu.org/viewcvs?rev=278512&root=gcc&view=rev
Log:
Fix PR90796

       PR middle-end/90796
       * gimple-loop-jam.c (any_access_function_variant_p): New function.
       (adjust_unroll_factor): Use it to constrain safety, new parameter.
       (tree_loop_unroll_and_jam): Adjust call and profitable unroll factor.

testsuite/
       Backport from mainline
       PR middle-end/90796
       * gcc.dg/unroll-and-jam.c: Disable loop-invariant motion and adjust.

       PR middle-end/90796
       * gcc.dg/unroll-and-jam.c: Add three invalid and one valid case.

Modified:
    branches/gcc-9-branch/gcc/ChangeLog
    branches/gcc-9-branch/gcc/gimple-loop-jam.c
    branches/gcc-9-branch/gcc/testsuite/ChangeLog
    branches/gcc-9-branch/gcc/testsuite/gcc.dg/unroll-and-jam.c
Comment 18 Jakub Jelinek 2020-03-04 09:43:13 UTC
GCC 8.4.0 has been released, adjusting target milestone.
Comment 19 Jakub Jelinek 2021-05-14 13:08:58 UTC
The GCC 8 branch is being closed, fixed in GCC 9.3.