commit 283cb9ea6293e813e48a1b769e1e0779918ea20a (r11-161) Author: Richard Biener <rguenther@suse.de> Date: Mon Apr 27 14:45:54 2020 +0200 tree-optimization/57359 - rewrite SM code This rewrites store-motion to process candidates where we can ensure order preserving separately and with no need to disambiguate against all stores. Those candidates we cannot handle this way are validated to be independent on all stores (w/o TBAA) and then processed as "unordered" (all conditionally executed stores are so as well). This will necessary cause FAIL: gcc.dg/graphite/pr80906.c scan-tree-dump graphite "isl AST to Gimple succeeded" because the SM previously performed is not valid for exactly the PR57359 reason, we still perform SM of qc for the innermost loop but that's not enough. There is still room for improvements because we still check some constraints for the order preserving cases that are only necessary in the current strict way for the unordered ones. Leaving that for the furture. caused: FAIL: gcc.target/i386/pr64110.c scan-assembler vmovd[\\t ]
Ah, forgot to update this testcase. This is another instance of PR57359, that is, we may not sink the store to b across the store to *b since b may point to itself and with j == 1 we'd change b = b + 2; *b = x; to *b = x; b = b + 2; note there's a twist for this particular case, namely the preceeding load of 'b' gives us knowledge about the dynamic type of 'b' which means we could use that to assess that we _can_ exchange the stores. But that logic is not implemented. I'll see how to do that.
(In reply to Richard Biener from comment #1) > Ah, forgot to update this testcase. This is another instance of PR57359, > that is, we may not sink the store to b across the store to *b since b may > point > to itself and with j == 1 we'd change > > b = b + 2; > *b = x; > > to > > *b = x; > b = b + 2; > > note there's a twist for this particular case, namely the preceeding load > of 'b' gives us knowledge about the dynamic type of 'b' which means we > could use that to assess that we _can_ exchange the stores. > > But that logic is not implemented. > > I'll see how to do that. OK, we can't. Consider the following which we miscompile with GCC 10 but which is fixed on trunk. bar () is simply the inner loop of bar in the pr64110.c testcase. GCC 10 and earlier transform b++; *b = x; to tem = b + 1; *b = x; b = tem; which is wrong with b == &b, the *b = x store re-purposes the storage in 'b'. short *b; void __attribute__((noipa)) bar (short x, int j) { for (int i = 0; i < j; ++i) *b++ = x; } int main() { b = (short *)&b; bar (0, 1); if ((short)(unsigned long)b != 0) __builtin_abort (); return 0; } Now the only thing that can be done (as noted in PR57359) is re-materializing _both_ stores on the exit. Thus turn for (int i = 0; i < j; ++i) { tem = b; tem = tem + 1; b = tem; *tem = x; } into tem = b; for (int i = 0; i < j; ++i) { tem = tem + 1; *tem = x; } b = tem; *tem = x; when applying store-motion. Note this only works when b is written to unconditionally. It also needs some kind of a cost model I guess...
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:b6ff3ddecfa93d53867afaaa078f85fc848abbbd commit r11-272-gb6ff3ddecfa93d53867afaaa078f85fc848abbbd Author: Richard Biener <rguenther@suse.de> Date: Fri May 8 12:03:30 2020 +0200 tree-optimization/94988 - enhance SM some more This enhances store-order preserving store motion to handle the case of non-invariant dependent stores in the sequence of unconditionally executed stores on exit by re-issueing them as part of the sequence of stores on the exit. This fixes the observed regression of gcc.target/i386/pr64110.c which relies on store-motion of 'b' for a loop like for (int i = 0; i < j; ++i) *b++ = x; where for correctness we now no longer apply store-motion. With the patch we emit the correct tem = b; for (int i = 0; i < j; ++i) { tem = tem + 1; *tem = x; } b = tem; *tem = x; preserving the original order of stores. A testcase reflecting the miscompilation done by earlier GCC is added as well. This also fixes the reported ICE in PR95025 and adds checking code to catch it earlier - the issue was not-supported refs propagation leaving stray refs in the sequence. 2020-05-11 Richard Biener <rguenther@suse.de> PR tree-optimization/94988 PR tree-optimization/95025 * tree-ssa-loop-im.c (seq_entry): Make a struct, add from. (sm_seq_push_down): Take extra parameter denoting where we moved the ref to. (execute_sm_exit): Re-issue sm_other stores in the correct order. (sm_seq_valid_bb): When always executed, allow sm_other to prevail inbetween sm_ord and record their stored value. (hoist_memory_references): Adjust refs_not_supported propagation and prune sm_other from the end of the ordered sequences. * gcc.dg/torture/pr94988.c: New testcase. * gcc.dg/torture/pr95025.c: Likewise. * gcc.dg/torture/pr95045.c: Likewise. * g++.dg/asan/pr95025.C: New testcase.
Fixed.
One new testcase FAILs on several targets: +FAIL: gcc.dg/torture/pr94988.c -O0 execution test +FAIL: gcc.dg/torture/pr94988.c -O1 execution test +FAIL: gcc.dg/torture/pr94988.c -O2 execution test +FAIL: gcc.dg/torture/pr94988.c -O2 -flto execution test +FAIL: gcc.dg/torture/pr94988.c -O2 -flto -flto-partition=none execution test +FAIL: gcc.dg/torture/pr94988.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test +FAIL: gcc.dg/torture/pr94988.c -O3 -g execution test +FAIL: gcc.dg/torture/pr94988.c -Os execution test I'm seeing it on 32 and 64-bit sparc-sun-solaris2.11, and there are also reports for moxie-unknown-elf, powerpc64-unknown-linux-gnu, s390x-ibm-linux-gnu, all big-endian targets AFAICS.
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:119a7db1e05c9741803b3ff93266b00fd535732a commit r11-320-g119a7db1e05c9741803b3ff93266b00fd535732a Author: Richard Biener <rguenther@suse.de> Date: Tue May 12 14:13:32 2020 +0200 middle-end/94988 fix testcase for big-endian The testcase only works for little-endian, mark it so. 2020-05-12 Richard Biener <rguenther@suse.de> PR middle-end/94988 * gcc.dg/torture/pr94988.c: Disable runtime test for * non-little-endian.
The master branch has been updated by Thomas Schwinge <tschwinge@gcc.gnu.org>: https://gcc.gnu.org/g:2210ef7d3d68a027ec16476825049567953c7fa4 commit r11-1348-g2210ef7d3d68a027ec16476825049567953c7fa4 Author: Thomas Schwinge <thomas@codesourcery.com> Date: Sat Jun 6 18:23:28 2020 +0200 Un-XFAIL 'gcc.dg/graphite/pr80906.c' The recent commit b6ff3ddecfa93d53867afaaa078f85fc848abbbd "tree-optimization/94988 - enhance SM some more" fixed this. gcc/testsuite/ PR tree-optimization/94988 * gcc.dg/graphite/pr80906.c: Un-XFAIL.
ISTM that the test invokes undefined behavior because the assignment and the increment in the loop both modify the same storage without an intervening sequence point. ISTM that the dynamic type of that storage is thus uncertain, and accessing it afterwards, without an intervening store that resolves its type either way, would also invoke undefined behavior.
Reasoning that the concurrent stores invoke undefined behavior would enable us to assume that the stores don't alias, which invalidates the reasoning in comment 1. Alas, I don't think gimple preserves enough information for us to tell that two statements used to be concurrent so as to derive optimization opportunities from them.
(In reply to Alexandre Oliva from comment #9) > ISTM that the test invokes undefined behavior because the assignment and the > increment in the loop both modify the same storage without an intervening > sequence point. ISTM that the dynamic type of that storage is thus > uncertain, and accessing it afterwards, without an intervening store that > resolves its type either way, would also invoke undefined behavior. I think the source can be rewritten to *b = x; b++; and still show the original issue on the GIMPLE side.