Reproduction: --- #include <stdio.h> unsigned b[11]; unsigned c; int d, e, f; char en; int main() { for (; e < 100; e += 3) { printf("b[0] = %u\n", b[0]); for (; c < 9; c++) for (d = 2; d < 11; d++) { f = b[c + 2] ^ 9; b[c] = f; } } } --- Compiler information: --- > gcc -v Using built-in specs. COLLECT_GCC=/apps/bin/gcc COLLECT_LTO_WRAPPER=/apps/9fafffe/bin/../libexec/gcc/x86_64-pc-linux-gnu/10.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /srcs/gcc/configure --disable-bootstrap --prefix=/apps/9fafffe Thread model: posix gcc version 10.0.0 20190608 (experimental) (GCC) --- Compile and run like this: > gcc -O2 min.c -o corr > ./corr >& corr.log > gcc -O3 min.c -o wrong > ./wrong >& wrong.log Correct output: b[0] = 0 b[0] = 9 .... b[0] = 9 Wrong output: b[0] = 0 b[0] = 0 .... b[0] = 0 O0 and O1 modes agree with O2.
Confirmed. I also notice a missed optimization: _16 = ivtmp.21_234 + 2; _2 = b[_16]; _41 = _2 ^ 9; b[ivtmp.21_234] = _41; _52 = b[_16]; _51 = _52 ^ 9; b[ivtmp.21_234] = _51; _65 = b[_16]; etc. it seems clear from the first line that b[_16] and b[ivtmp.21_234] cannot alias, so this should simplify. And it isn't just because cunroll is late, we don't simplify it either if I write directly this pattern in C.
-fdisable-tree-unrolljam helps, which may (or may not) point at a potential culprit.
Yes, it started with r255467. There's a simplified test-case: $ cat pr90796.c unsigned b[11]; unsigned c; int d, e, f; char en; int main() { char b[100]; for (; e < 6; e += 3) { __builtin_sprintf(b, "%u", b[0]); for (; c < 9; c++) for (d = 2; d < 11; d++) { f = b[c + 2] ^ 9; b[c] = f; } } __builtin_printf("b:%s\n", b); if (__builtin_strcmp (b, "9") != 0) __builtin_abort (); return 0; } $ gcc pr90796.c -O3 && ./a.out b:0 Aborted (core dumped)
Hmm, the CFG looks like unroll-and-jam attempts to do versioning/peeling but forgets the tail loop is executed at least once?
FWIW, the reduced testcase from comment #3 is wrong. Even with -O0 or with gcc 4.3 or 6 I get: b:48 Aborted (core dumped) But I can reproduce the problem with the original testcase.
I think I know what's going on. Mine.
Created attachment 46675 [details] potential patch Actually I was barking up the wrong tree. It's not as easy as the CFG manipulation for loop fusion going wrong (like missing some last iterations or so). It's really a problem in the dependence analysis. See the extensive comment in the patch. The fun thing is, there's a difference between these two loop nests: for (i) for (j) a[i][0] = f(a[i+1][0]); for (i) for (j) b[i][j] = f(a[i+1][j]); Even though the distance vector for the read/write in the single statement is (-1,0) for both loops, unroll-and-jam is valid for the second but not for the first.
On August 5, 2019 9:53:48 PM GMT+02:00, "matz at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796 > >--- Comment #7 from Michael Matz <matz at gcc dot gnu.org> --- >Created attachment 46675 [details] > --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46675&action=edit >potential patch > >Actually I was barking up the wrong tree. It's not as easy as the CFG >manipulation for loop fusion going wrong (like missing some last >iterations >or so). It's really a problem in the dependence analysis. See the >extensive >comment in the patch. > >The fun thing is, there's a difference between these two loop nests: > > for (i) for (j) a[i][0] = f(a[i+1][0]); > for (i) for (j) b[i][j] = f(a[i+1][j]); What about B[i][j/2]... ? It's really surprising that only invariants are special here. >Even though the distance vector for the read/write in the single >statement >is (-1,0) for both loops, unroll-and-jam is valid for the second but >not >for the first.
(In reply to rguenther@suse.de from comment #8) > >The fun thing is, there's a difference between these two loop nests: > > > > for (i) for (j) a[i][0] = f(a[i+1][0]); > > for (i) for (j) b[i][j] = f(a[i+1][j]); > > What about > > B[i][j/2]... > > ? That would be a problem as well, but luckily that's not an affine function of j, and hence has no analyzable access function, and so isn't fused for different reasons. > It's really surprising that only invariants are special here. It's the only affine functions that don't progress with each iteration. I think, at least :)
On August 6, 2019 5:36:49 PM GMT+02:00, "matz at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796 > >--- Comment #9 from Michael Matz <matz at gcc dot gnu.org> --- >(In reply to rguenther@suse.de from comment #8) >> >The fun thing is, there's a difference between these two loop nests: >> > >> > for (i) for (j) a[i][0] = f(a[i+1][0]); >> > for (i) for (j) b[i][j] = f(a[i+1][j]); >> >> What about >> >> B[i][j/2]... >> >> ? > >That would be a problem as well, but luckily that's not an affine >function of >j, >and hence has no analyzable access function, and so isn't fused for >different >reasons. > >> It's really surprising that only invariants are special here. > >It's the only affine functions that don't progress with each iteration. > I >think, at least :) Hm. At least we analyze wrapping ones, but I guess 0, 1, 0, 1 would be caught in another way..
(In reply to rguenther@suse.de from comment #10) > >It's the only affine functions that don't progress with each iteration. > > I > >think, at least :) > > Hm. At least we analyze wrapping ones, but I guess 0, 1, 0, 1 would be > caught in another way.. Yes, we analyze them, but for nothing. They aren't affine either, and hence result in unknown dependences.
Author: matz Date: Tue Oct 22 12:25:03 2019 New Revision: 277287 URL: https://gcc.gnu.org/viewcvs?rev=277287&root=gcc&view=rev Log: Fix PR middle-end/90796 PR middle-end/90796 * gimple-loop-jam.c (any_access_function_variant_p): New function. (adjust_unroll_factor): Use it to constrain safety, new parameter. (tree_loop_unroll_and_jam): Adjust call and profitable unroll factor. testsuite/ * gcc.dg/unroll-and-jam.c: Add three invalid and one valid case. Modified: trunk/gcc/ChangeLog trunk/gcc/gimple-loop-jam.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/unroll-and-jam.c
Fixed in trunk so far. Will be backporting in a few days.
Time to backport now?
*** Bug 91240 has been marked as a duplicate of this bug. ***
(In reply to Jakub Jelinek from comment #14) > Time to backport now? Hmpf, I've actually done the regstrapping for gcc9 already but then forgot to submit. Thanks for the reminder.
Author: matz Date: Wed Nov 20 16:51:10 2019 New Revision: 278512 URL: https://gcc.gnu.org/viewcvs?rev=278512&root=gcc&view=rev Log: Fix PR90796 PR middle-end/90796 * gimple-loop-jam.c (any_access_function_variant_p): New function. (adjust_unroll_factor): Use it to constrain safety, new parameter. (tree_loop_unroll_and_jam): Adjust call and profitable unroll factor. testsuite/ Backport from mainline PR middle-end/90796 * gcc.dg/unroll-and-jam.c: Disable loop-invariant motion and adjust. PR middle-end/90796 * gcc.dg/unroll-and-jam.c: Add three invalid and one valid case. Modified: branches/gcc-9-branch/gcc/ChangeLog branches/gcc-9-branch/gcc/gimple-loop-jam.c branches/gcc-9-branch/gcc/testsuite/ChangeLog branches/gcc-9-branch/gcc/testsuite/gcc.dg/unroll-and-jam.c
GCC 8.4.0 has been released, adjusting target milestone.
The GCC 8 branch is being closed, fixed in GCC 9.3.