Summary: | [8 Regression] GCC: O2 vs O3 output differs on simple test | ||
---|---|---|---|
Product: | gcc | Reporter: | Tilir <konstantin.vladimirov> |
Component: | middle-end | Assignee: | Michael Matz <matz> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | dimhen, jakub, matz, rguenth |
Priority: | P2 | Keywords: | wrong-code |
Version: | 10.0 | ||
Target Milestone: | 9.3 | ||
See Also: | https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91240 | ||
Host: | Target: | ||
Build: | Known to work: | 7.4.0, 9.2.1 | |
Known to fail: | 8.3.0, 9.1.0, 9.2.0 | Last reconfirmed: | 2019-06-09 00:00:00 |
Attachments: | potential patch |
Description
Tilir
2019-06-09 07:23:14 UTC
Confirmed. I also notice a missed optimization: _16 = ivtmp.21_234 + 2; _2 = b[_16]; _41 = _2 ^ 9; b[ivtmp.21_234] = _41; _52 = b[_16]; _51 = _52 ^ 9; b[ivtmp.21_234] = _51; _65 = b[_16]; etc. it seems clear from the first line that b[_16] and b[ivtmp.21_234] cannot alias, so this should simplify. And it isn't just because cunroll is late, we don't simplify it either if I write directly this pattern in C. -fdisable-tree-unrolljam helps, which may (or may not) point at a potential culprit. Yes, it started with r255467. There's a simplified test-case: $ cat pr90796.c unsigned b[11]; unsigned c; int d, e, f; char en; int main() { char b[100]; for (; e < 6; e += 3) { __builtin_sprintf(b, "%u", b[0]); for (; c < 9; c++) for (d = 2; d < 11; d++) { f = b[c + 2] ^ 9; b[c] = f; } } __builtin_printf("b:%s\n", b); if (__builtin_strcmp (b, "9") != 0) __builtin_abort (); return 0; } $ gcc pr90796.c -O3 && ./a.out b:0 Aborted (core dumped) Hmm, the CFG looks like unroll-and-jam attempts to do versioning/peeling but forgets the tail loop is executed at least once? FWIW, the reduced testcase from comment #3 is wrong. Even with -O0 or with gcc 4.3 or 6 I get: b:48 Aborted (core dumped) But I can reproduce the problem with the original testcase. I think I know what's going on. Mine. Created attachment 46675 [details]
potential patch
Actually I was barking up the wrong tree. It's not as easy as the CFG manipulation for loop fusion going wrong (like missing some last iterations
or so). It's really a problem in the dependence analysis. See the extensive
comment in the patch.
The fun thing is, there's a difference between these two loop nests:
for (i) for (j) a[i][0] = f(a[i+1][0]);
for (i) for (j) b[i][j] = f(a[i+1][j]);
Even though the distance vector for the read/write in the single statement
is (-1,0) for both loops, unroll-and-jam is valid for the second but not
for the first.
On August 5, 2019 9:53:48 PM GMT+02:00, "matz at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796 > >--- Comment #7 from Michael Matz <matz at gcc dot gnu.org> --- >Created attachment 46675 [details] > --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46675&action=edit >potential patch > >Actually I was barking up the wrong tree. It's not as easy as the CFG >manipulation for loop fusion going wrong (like missing some last >iterations >or so). It's really a problem in the dependence analysis. See the >extensive >comment in the patch. > >The fun thing is, there's a difference between these two loop nests: > > for (i) for (j) a[i][0] = f(a[i+1][0]); > for (i) for (j) b[i][j] = f(a[i+1][j]); What about B[i][j/2]... ? It's really surprising that only invariants are special here. >Even though the distance vector for the read/write in the single >statement >is (-1,0) for both loops, unroll-and-jam is valid for the second but >not >for the first. (In reply to rguenther@suse.de from comment #8) > >The fun thing is, there's a difference between these two loop nests: > > > > for (i) for (j) a[i][0] = f(a[i+1][0]); > > for (i) for (j) b[i][j] = f(a[i+1][j]); > > What about > > B[i][j/2]... > > ? That would be a problem as well, but luckily that's not an affine function of j, and hence has no analyzable access function, and so isn't fused for different reasons. > It's really surprising that only invariants are special here. It's the only affine functions that don't progress with each iteration. I think, at least :) On August 6, 2019 5:36:49 PM GMT+02:00, "matz at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796 > >--- Comment #9 from Michael Matz <matz at gcc dot gnu.org> --- >(In reply to rguenther@suse.de from comment #8) >> >The fun thing is, there's a difference between these two loop nests: >> > >> > for (i) for (j) a[i][0] = f(a[i+1][0]); >> > for (i) for (j) b[i][j] = f(a[i+1][j]); >> >> What about >> >> B[i][j/2]... >> >> ? > >That would be a problem as well, but luckily that's not an affine >function of >j, >and hence has no analyzable access function, and so isn't fused for >different >reasons. > >> It's really surprising that only invariants are special here. > >It's the only affine functions that don't progress with each iteration. > I >think, at least :) Hm. At least we analyze wrapping ones, but I guess 0, 1, 0, 1 would be caught in another way.. (In reply to rguenther@suse.de from comment #10) > >It's the only affine functions that don't progress with each iteration. > > I > >think, at least :) > > Hm. At least we analyze wrapping ones, but I guess 0, 1, 0, 1 would be > caught in another way.. Yes, we analyze them, but for nothing. They aren't affine either, and hence result in unknown dependences. Author: matz Date: Tue Oct 22 12:25:03 2019 New Revision: 277287 URL: https://gcc.gnu.org/viewcvs?rev=277287&root=gcc&view=rev Log: Fix PR middle-end/90796 PR middle-end/90796 * gimple-loop-jam.c (any_access_function_variant_p): New function. (adjust_unroll_factor): Use it to constrain safety, new parameter. (tree_loop_unroll_and_jam): Adjust call and profitable unroll factor. testsuite/ * gcc.dg/unroll-and-jam.c: Add three invalid and one valid case. Modified: trunk/gcc/ChangeLog trunk/gcc/gimple-loop-jam.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/unroll-and-jam.c Fixed in trunk so far. Will be backporting in a few days. Time to backport now? *** Bug 91240 has been marked as a duplicate of this bug. *** (In reply to Jakub Jelinek from comment #14) > Time to backport now? Hmpf, I've actually done the regstrapping for gcc9 already but then forgot to submit. Thanks for the reminder. Author: matz Date: Wed Nov 20 16:51:10 2019 New Revision: 278512 URL: https://gcc.gnu.org/viewcvs?rev=278512&root=gcc&view=rev Log: Fix PR90796 PR middle-end/90796 * gimple-loop-jam.c (any_access_function_variant_p): New function. (adjust_unroll_factor): Use it to constrain safety, new parameter. (tree_loop_unroll_and_jam): Adjust call and profitable unroll factor. testsuite/ Backport from mainline PR middle-end/90796 * gcc.dg/unroll-and-jam.c: Disable loop-invariant motion and adjust. PR middle-end/90796 * gcc.dg/unroll-and-jam.c: Add three invalid and one valid case. Modified: branches/gcc-9-branch/gcc/ChangeLog branches/gcc-9-branch/gcc/gimple-loop-jam.c branches/gcc-9-branch/gcc/testsuite/ChangeLog branches/gcc-9-branch/gcc/testsuite/gcc.dg/unroll-and-jam.c GCC 8.4.0 has been released, adjusting target milestone. The GCC 8 branch is being closed, fixed in GCC 9.3. |