regehr@john-home ~/z/reduce/r109 $ clang -O small.c ; ./a.out 0 regehr@john-home ~/z/reduce/r109 $ gcc -O small.c ; ./a.out 0 regehr@john-home ~/z/reduce/r109 $ gcc -Os small.c ; ./a.out 1 regehr@john-home ~/z/reduce/r109 $ cat small.c int printf(const char *, ...); struct x0 { volatile int x1; int x2; int x3; int x4; int x5; } x6; static struct x0 x7, x8; int x9 = 1; char x10(); static struct x0 x11() { if (x10()) return x6; return x7; } char x10() { return x9; } int main() { x8 = x11(); x6.x2 = 1; printf("%d\n", x8.x2); return 0; } regehr@john-home ~/z/reduce/r109 $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/home/regehr/z/compiler-install/gcc-r202341-install/libexec/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: /home/regehr/z/compiler-source/gcc/configure --prefix=/home/regehr/z/compiler-install/gcc-r202341-install --enable-languages=c,c++ --disable-multilib Thread model: posix gcc version 4.9.0 20130906 (experimental) (GCC) regehr@john-home ~/z/reduce/r109 $
Confirmed, from 4.6 to trunk. The .optimized dump looks ok (basically the same for -O2 and -Os), the dumps become very different at expansion (didn't check if the error appears in expand or later).
Works still with r162588, fails with r162597, revisions in between those two don't build, so can't be bisected more. But it must be one of the GCSE changes in that range.
Bug introduced in revision 162592. Investigating. commit 48d5e32fb40bb7973e596b36d12ad4041d8ef10a Author: mkuvyrkov <mkuvyrkov@138bc75d-0d04-0410-961f-82ee72b054a4> Date: Tue Jul 27 19:38:10 2010 +0000 PR target/42495 PR middle-end/42574 * gcse.c (hoist_expr_reaches_here_p): Remove excessive check. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@162592 138bc75d-0d04-0410-961f
The underlying bug appears to be in CSA (combine-stack-adj.c) and I would appreciate Richard's insight on how to fix it. Summary: CSA merges stack variable into global one, and then alias analysis thinks that ex-stack-now-global variable does not overlap with global variables. After patch in rev. 162592 code hoisting becomes able to simplify code just a bit, which causes scheduler to generate a different sequence, exposing a latent bug. During expand we have: # BLOCK 3 freq:3900 # PRED: 2 [39.0%] (true,exec) 1 D.2750 = x6; goto <bb 5>; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 4 freq:6100 # PRED: 2 [61.0%] (false,exec) D.2750 = x7; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 5 freq:10000 # PRED: 4 [100.0%] (fallthru,exec) 3 [100.0%] (fallthru,exec) 2 x8 = D.2750; 3 x6.x2 = 1; D.2733_1 = x8.x2; printf ("%d\n", D.2733_1); CSA combines stack variable D.2750 in insn (1) into global x6, but fails to update uses of D.2750. After CSA DECL_RTL of MEM in insn (2) still references D.2750 even though it now reads directly from x6. Because D.2750 is on stack and x6 is global -- alias analysis (nonoverlapping_memrefs_p()) tells scheduler that it is OK to move insn (3) before insn (2).
That is actually the crossjumping pass after csa (jump2; though yes, previously it has been part of csa). In any case, I don't see anything wrong with the cross jumping, it turns the two: (set (mem/c:BLK (reg:DI 5 di [81]) [3 D.1761+0 S20 A32]) (mem/c:BLK (reg:DI 4 si [82]) [3 x6+0 S20 A32])) and (set (mem/c:BLK (reg:DI 5 di [84]) [3 D.1761+0 S20 A32]) (mem/u/c:BLK (reg:DI 4 si [85]) [3 x7+0 S20 A128])) into: (set (mem/c:BLK (reg:DI 5 di [84]) [3 D.1761+0 S20 A32]) (mem/u/c:BLK (reg:DI 4 si [85]) [3 S20 A32])) (note, MEM_EXPR cleared), where si is set conditionally to x6 or x7. And then we have the: (insn 29 27 31 6 (set (mem/c:SI (const:DI (plus:DI (symbol_ref:DI ("x6") <var_decl 0x7f9ce47572f8 x6>) (const_int 4 [0x4]))) [2 x6.x2+0 S4 A32]) (const_int 1 [0x1])) pr58365.c:31 86 {*movsi_internal} (nil)) store that sched2 moves over the above rep_movsi, although there is a (conditional, may, but doesn't have to) aliasing case.
(In reply to Jakub Jelinek from comment #5) > That is actually the crossjumping pass after csa (jump2; though yes, > previously it has been part of csa). In any case, I don't see anything > wrong with the cross jumping, it turns the two: > (set (mem/c:BLK (reg:DI 5 di [81]) [3 D.1761+0 S20 A32]) > (mem/c:BLK (reg:DI 4 si [82]) [3 x6+0 S20 A32])) > and > (set (mem/c:BLK (reg:DI 5 di [84]) [3 D.1761+0 S20 A32]) > (mem/u/c:BLK (reg:DI 4 si [85]) [3 x7+0 S20 A128])) > into: > (set (mem/c:BLK (reg:DI 5 di [84]) [3 D.1761+0 S20 A32]) > (mem/u/c:BLK (reg:DI 4 si [85]) [3 S20 A32])) > (note, MEM_EXPR cleared), where si is set conditionally to x6 or x7. > And then we have the: > (insn 29 27 31 6 (set (mem/c:SI (const:DI (plus:DI (symbol_ref:DI ("x6") > <var_decl 0x7f9ce47572f8 x6>) > (const_int 4 [0x4]))) [2 x6.x2+0 S4 A32]) > (const_int 1 [0x1])) pr58365.c:31 86 {*movsi_internal} > (nil)) > store that sched2 moves over the above rep_movsi, although there is a > (conditional, may, but doesn't have to) aliasing case. Jacub, I don't quite understand whether you are saying that the bug is /not/ in csa (at GCC 4.7 age) or whether you are providing additional analysis. I'm debugging the gcc-4.7-era revision 162592 and here the source MEM in rep_movsi references stack variable instead of x6, which causes wrong aliasing decision.
Ah, actually the bug is really in the cross-jumping, in this case that it hasn't cleared MEM_READONLY_P. While x7 is read-only, x6 is not, so merge_memattrs should combine that to !MEM_READONLY_P.
Created attachment 30781 [details] gcc49-pr58365.patch Untested fix.
Author: jakub Date: Tue Sep 10 11:47:19 2013 New Revision: 202434 URL: http://gcc.gnu.org/viewcvs?rev=202434&root=gcc&view=rev Log: PR rtl-optimization/58365 * cfgcleanup.c (merge_memattrs): Also clear MEM_READONLY_P resp. MEM_NOTRAP_P if they differ, or set MEM_VOLATILE_P if it differs. * gcc.c-torture/execute/pr58365.c: New test. Added: trunk/gcc/testsuite/gcc.c-torture/execute/pr58365.c Modified: trunk/gcc/ChangeLog trunk/gcc/cfgcleanup.c trunk/gcc/testsuite/ChangeLog
Author: jakub Date: Tue Sep 10 11:48:30 2013 New Revision: 202435 URL: http://gcc.gnu.org/viewcvs?rev=202435&root=gcc&view=rev Log: PR rtl-optimization/58365 * cfgcleanup.c (merge_memattrs): Also clear MEM_READONLY_P resp. MEM_NOTRAP_P if they differ, or set MEM_VOLATILE_P if it differs. * gcc.c-torture/execute/pr58365.c: New test. Added: branches/gcc-4_8-branch/gcc/testsuite/gcc.c-torture/execute/pr58365.c Modified: branches/gcc-4_8-branch/gcc/ChangeLog branches/gcc-4_8-branch/gcc/cfgcleanup.c branches/gcc-4_8-branch/gcc/testsuite/ChangeLog
Fixed for 4.8.2+ so far.
Author: jakub Date: Wed May 7 16:04:44 2014 New Revision: 210173 URL: http://gcc.gnu.org/viewcvs?rev=210173&root=gcc&view=rev Log: Backported from mainline 2013-09-10 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/58365 * cfgcleanup.c (merge_memattrs): Also clear MEM_READONLY_P resp. MEM_NOTRAP_P if they differ, or set MEM_VOLATILE_P if it differs. * gcc.c-torture/execute/pr58365.c: New test. Added: branches/gcc-4_7-branch/gcc/testsuite/gcc.c-torture/execute/pr58365.c Modified: branches/gcc-4_7-branch/gcc/ChangeLog branches/gcc-4_7-branch/gcc/cfgcleanup.c branches/gcc-4_7-branch/gcc/testsuite/ChangeLog
Fixed.