Created attachment 44296 [details] Test case Hi, We noticed that MySQL does not pass its test suite when compiled with GCC 8; it runs out of stack. (GCC 7 is fine.) A reduced test case is included (mostly by C-Reduce, but it needed some help by hand); most of it appears to be fluff that keeps the compiler from just optimizing away the entire thing, but the gist of it seems to be that it inlines the bg::bl() function several times without caring that it balloons the stack size, and then doesn't manage to shrink the stack again by overlapping variables. Putting the noinline attribute on bg::bl() seems to be a workaround for now. For comparison: > g++-7 -O2 -Wstack-usage=1 -Wno-return-type -Wno-unused-result -c stack.i stack.i: In function ‘void c()’: stack.i:34:6: warning: stack usage is 8240 bytes [-Wstack-usage=] void c() { ^ > g++-8 -O2 -Wstack-usage=1 -Wno-return-type -Wno-unused-result -c stack.i stack.i: In function ‘void c()’: stack.i:34:6: warning: stack usage is 32816 bytes [-Wstack-usage=] void c() { ^ The actual, unreduced file can be found at https://github.com/mysql/mysql-server/blob/8.0/storage/innobase/row/row0ins.cc#L926 (the line is positioned on a function whose adding noinline helps, although I don't think it corresponds directly to bg::bl; I think bg::bl might be ib::error, and the 8192-sized buffer comes from ib::logger::msg).
Inlining decisions are not so different between 7/8, the main difference is gcc-8 translates b::x() into __builtin_unreachable and warns accordingly: warning: no return statement in function returning non-void [-Wreturn-type] b x(b) {} ^ and with that change gcc-8 no longer manages to prove that big arrays have non-overlapping lifetimes. If I change the source to well-formed 'void x(b) {}', it compiles as desired. So, assuming the original MySQL source is free of that warning, the testcase is too aggressively reduced and no longer reflects the original issue. Can you please re-reduce?
OK, starting a reduce that also checks for no -Wreturn-type warnings.
Created attachment 44349 [details] Test case #2
New test case added. It's quite a bit larger than the old one, but has no -Wreturn-type warning.
Sorry, this still seems over-reduced: the 'cmp' variable is uninitialized, and gcc-7 completely optimizes out the block with large stack usage guarded by 'cmp == 0' test, so gcc-7 vs gcc-8 is not directly comparable. It's strange that gcc-7 optimizes that out, but it's a different issue. Can you attach the unreduced preprocessed source, and if you make another attempt at reducing, perhaps enable most warnings? That said, it seems gcc is not very good at re-discovering non-overlapping stack allocations introduced by inlining. Looking at your testcase I came up with the following minimal test: struct S{~S();}; void f(void *); inline void ff() { char c[1000]; f(c); } void g(int n) { S s; char c[100]; f(c); if (n) ff(), ff(); } (there's no regression vs. gcc-7 on this example, but gcc-4.6 used to get a better result by consuming 1100 bytes rather than 2100).
I wouldn't be surprised if everything is really over-reduced, and that GCC 7 just happens not to hit this by luck. I'll be making a checkout of the public git repository and try to reproduce there, so that I can give you an unreduced version.
Created attachment 44351 [details] Unreduced test case There; unreduced, from public MySQL. This is preprocessed with GCC 7, and compiles both with 7 and 8 (the latter triggers too big stack).
Removing the 'waiting' status.
GCC 8.2 has been released.
The issue with Alex testcase is that inlining when faced with ff () { charD.10 cD.2379[1000]; <bb 2> : f (&cD.2379); <bb 3> : cD.2379 ={v} {CLOBBER}; return; <bb 4> : <L0>: cD.2379 ={v} {CLOBBER}; resx 1 } ends up unifying the resx 1 parts, making the lifetime of the two instances overlap: g (intD.9 nD.2380) { charD.10 cD.2411[1000]; charD.10 cD.2410[1000]; ... <bb 4> : f (&cD.2410); <bb 5> : cD.2410 ={v} {CLOBBER}; f (&cD.2411); <bb 6> : cD.2411 ={v} {CLOBBER}; ... all fine up to here <bb 9> : <L4>: sD.2383 ={v} {CLOBBER}; cD.2399 ={v} {CLOBBER}; resx 1 oops. When analyzing lifetime of variables during RTL expansion we might want to not consider blocks where only the clobbers mention the vars thus virtually move those up as far as possible.
Looking at the #c7 testcase, confirming that ~ 29KB stack in one of the functions. The problem is that msg has char buf[8192]; variable in it and is inline, gets inlined into a function 3 times and can throw. ehcleanup1 removes the buf (and str) clobbers that were the only reason to have an EH pad that just rethrows (and I agree it is a good idea to do that, because otherwise inliner thinks the functions are more expensive than they actually are). But then the function into which this function is ultimately inlined has some finalization (destructors) covering the code into which it has been inlined, so the former EH with no successor block because it would throw externally now becomes EH edge from the code to a landing block onto which everything is marked as conflicting, there is no clobber at all for the variables. So, I wonder if we shouldn't add in such cases clobbers to the start of those landing pads in the situation, either for all variables that live in memory, or at least for the larger ones. Will play with it tomorrow. There is another thing - I've noticed add_stack_var_conflict is often called with x == y, shouldn't we return right away in that case? We don't need to record that a var conflicts with itself, we later on return that no variable conflicts with itself. Note, before r255104 we weren't inlining msg into the bigger function and thus the issue was latent. The workaround for MySQL, at least for -O2, would be to move logger:msg definition out from the class, so it is not inline, then at least gcc trunk doesn't want to inline it and you don't run into this.
Author: jakub Date: Thu Jan 17 08:05:12 2019 New Revision: 268009 URL: https://gcc.gnu.org/viewcvs?rev=268009&root=gcc&view=rev Log: PR tree-optimization/86214 * cfgexpand.c (add_stack_var_conflict): Don't add any conflicts if x == y. Modified: trunk/gcc/ChangeLog trunk/gcc/cfgexpand.c
Short testcase that shows what's going on on the #c7 testcase: typedef __SIZE_TYPE__ size_t; struct A { A (); ~A (); int a; void qux (const char *); }; int bar (char *); static inline __attribute__((always_inline)) A foo () { char b[8192]; int x = bar (b); A s; if (x > 0 && (size_t) x < sizeof b) s.qux (b); return s; } void baz () { A a; foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); }
(In reply to Jakub Jelinek from comment #11) > The workaround for MySQL, at least for -O2, would be to move logger:msg > definition out from the class, so it is not inline, then at least gcc trunk > doesn't want to inline it and you don't run into this. Or just add -fconserve-stack to the g++ options.
Ah, the #c13 testcase shows the issue, but in the early inliner, not in IPA inliner like is triggered on the #c7 testcase. This modified testcase triggers it in the IPA inliner: typedef __SIZE_TYPE__ size_t; struct A { A (); ~A (); int a; void qux (const char *); }; int bar (char *); static inline A foo () { char b[8192]; int x = bar (b); A s; if (x > 0 && (size_t) x < sizeof b) s.qux (b); return s; } void baz () { A a; char c[1024]; bar (c); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); foo (); }
Created attachment 45449 [details] gcc9-pr86214.patch Untested patch. Not 100% sure if there must be just one landing pad, will see during bootstrap/regtest. If there must be at most one, the second field can go and it can just clear the first field.
Author: jakub Date: Fri Jan 18 10:07:27 2019 New Revision: 268067 URL: https://gcc.gnu.org/viewcvs?rev=268067&root=gcc&view=rev Log: PR tree-optimization/86214 * tree-inline.h (struct copy_body_data): Add add_clobbers_to_eh_landing_pads member. * tree-inline.c (add_clobbers_to_eh_landing_pad): New function. (copy_edges_for_bb): Call it if EH edge destination is < id->add_clobbers_to_eh_landing_pads. Fix a comment typo. (expand_call_inline): Set id->add_clobbers_to_eh_landing_pads if flag_stack_reuse != SR_NONE and clear it afterwards. * g++.dg/opt/pr86214-1.C: New test. * g++.dg/opt/pr86214-2.C: New test. Added: trunk/gcc/testsuite/g++.dg/opt/pr86214-1.C trunk/gcc/testsuite/g++.dg/opt/pr86214-2.C Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-inline.c trunk/gcc/tree-inline.h
Should be fixed now on the trunk. Likely not going to backport.
Thanks for fixing. IIRC we just added a noinline attribute somewhere in the code, so we already have a workaround.
(In reply to Jakub Jelinek from comment #17) > Author: jakub > Date: Fri Jan 18 10:07:27 2019 > New Revision: 268067 > > URL: https://gcc.gnu.org/viewcvs?rev=268067&root=gcc&view=rev > Log: > PR tree-optimization/86214 > * tree-inline.h (struct copy_body_data): Add > add_clobbers_to_eh_landing_pads member. > * tree-inline.c (add_clobbers_to_eh_landing_pad): New function. > (copy_edges_for_bb): Call it if EH edge destination is < > id->add_clobbers_to_eh_landing_pads. Fix a comment typo. > (expand_call_inline): Set id->add_clobbers_to_eh_landing_pads > if flag_stack_reuse != SR_NONE and clear it afterwards. > > * g++.dg/opt/pr86214-1.C: New test. > * g++.dg/opt/pr86214-2.C: New test. > > Added: > trunk/gcc/testsuite/g++.dg/opt/pr86214-1.C > trunk/gcc/testsuite/g++.dg/opt/pr86214-2.C > Modified: > trunk/gcc/ChangeLog > trunk/gcc/testsuite/ChangeLog > trunk/gcc/tree-inline.c > trunk/gcc/tree-inline.h Hi Jakub, Since you committed this patch, I've noticed regressions on some arm targets: FAIL: 23_containers/list/requirements/exception/basic.cc execution test libstdc++.log has this: spawn [open ...] N10__gnu_test12functor_base19iterator_operationsINSt7__cxx114listIN9__gnu_cxx17throw_value_limitENS4_21throw_allocator_limitIS5_EEEEEE end count 2 N10__gnu_test12functor_base25const_iterator_operationsINSt7__cxx114listIN9__gnu_cxx17throw_value_limitENS4_21throw_allocator_limitIS5_EEEEEE end count 3 N10__gnu_test12functor_base11erase_pointINSt7__cxx114listIN9__gnu_cxx17throw_value_limitENS4_21throw_allocator_limitIS5_EEEELb1ELb0EEE end count 4 N10__gnu_test12functor_base11erase_rangeINSt7__cxx114listIN9__gnu_cxx17throw_value_limitENS4_21throw_allocator_limitIS5_EEEELb1ELb0EEE end count 5 N10__gnu_test12functor_base12insert_pointINSt7__cxx114listIN9__gnu_cxx17throw_value_limitENS4_21throw_allocator_limitIS5_EEEELb1ELb0EEE end count 6 N10__gnu_test12functor_base7emplaceINSt7__cxx114listIN9__gnu_cxx17throw_value_limitENS4_21throw_allocator_limitIS5_EEEELb0EEE end count 7 N10__gnu_test12functor_base13emplace_pointINSt7__cxx114listIN9__gnu_cxx17throw_value_limitENS4_21throw_allocator_limitIS5_EEEELb1ELb0ELb0EEE end count 8 N10__gnu_test12functor_base13emplace_frontINSt7__cxx114listIN9__gnu_cxx17throw_value_limitENS4_21throw_allocator_limitIS5_EEEELb1EEE end count 9 N10__gnu_test12functor_base12emplace_backINSt7__cxx114listIN9__gnu_cxx17throw_value_limitENS4_21throw_allocator_limitIS5_EEEELb1EEE end count 10 N10__gnu_test12functor_base9pop_frontINSt7__cxx114listIN9__gnu_cxx17throw_value_limitENS4_21throw_allocator_limitIS5_EEEELb1EEE end count 11 N10__gnu_test12functor_base8pop_backINSt7__cxx114listIN9__gnu_cxx17throw_value_limitENS4_21throw_allocator_limitIS5_EEEELb1EEE end count 12 N10__gnu_test12functor_base10push_frontINSt7__cxx114listIN9__gnu_cxx17throw_value_limitENS4_21throw_allocator_limitIS5_EEEELb1EEE end count 13 N10__gnu_test12functor_base9push_backINSt7__cxx114listIN9__gnu_cxx17throw_value_limitENS4_21throw_allocator_limitIS5_EEEELb1EEE end count 14 N10__gnu_test12functor_base6rehashINSt7__cxx114listIN9__gnu_cxx17throw_value_limitENS4_21throw_allocator_limitIS5_EEEELb0EEE end count 15 N10__gnu_test12functor_base4swapINSt7__cxx114listIN9__gnu_cxx17throw_value_limitENS4_21throw_allocator_limitIS5_EEEEEE end count 16 qemu: uncaught target signal 11 (Segmentation fault) - core dumped The qemu trace does not seem very helpful (and is probably incomplete): IN: _ZN10__gnu_test12basic_safetyINSt7__cxx114listIN9__gnu_cxx17throw_value_limitENS3_21throw_allocator_limitIS4_EEEEE3runEv 0x0001c724: e5943008 ldr r3, [r4, #8] 0x0001c728: e59d2010 ldr r2, [sp, #0x10] 0x0001c72c: e59d101c ldr r1, [sp, #0x1c] 0x0001c730: e3530000 cmp r3, #0 0x0001c734: e5987000 ldr r7, [r8] 0x0001c738: e5812000 str r2, [r1] 0x0001c73c: e5896000 str r6, [sb] 0x0001c740: e5882000 str r2, [r8] 0x0001c744: 0a000510 beq #0x1db8c ---------------- IN: 0x40adc6bc: e0849009 add sb, r4, sb 0x40adc6c0: e59f29c8 ldr r2, [pc, #0x9c8] 0x40adc6c4: e5993004 ldr r3, [sb, #4] 0x40adc6c8: e08f2002 add r2, pc, r2 0x40adc6cc: e3833001 orr r3, r3, #1 0x40adc6d0: e1550002 cmp r5, r2 0x40adc6d4: e59da018 ldr sl, [sp, #0x18] 0x40adc6d8: e5893004 str r3, [sb, #4] 0x40adc6dc: 0a000002 beq #0x40adc6ec ---------------- IN: 0x40adc6ec: e59f39a0 ldr r3, [pc, #0x9a0] 0x40adc6f0: e2846008 add r6, r4, #8 0x40adc6f4: e08f3003 add r3, pc, r3 0x40adc6f8: e593102c ldr r1, [r3, #0x2c] 0x40adc6fc: e3510000 cmp r1, #0 0x40adc700: 1affff13 bne #0x40adc354 ---------------- IN: 0x40adc704: e1a00006 mov r0, r6 0x40adc708: e28dd044 add sp, sp, #0x44 0x40adc70c: e8bd8ff0 pop {r4, r5, r6, r7, r8, sb, sl, fp, pc} ---------------- IN: 0x40adc6ac: e085240c add r2, r5, ip, lsl #8 0x40adc6b0: e2822028 add r2, r2, #0x28 0x40adc6b4: e3a03001 mov r3, #1 0x40adc6b8: eaffffce b #0x40adc5f8 I've noticed this on: arm-none-linux-gnueabihf --with-mode arm --with-cpu cortex-a9 --with-fpu vfp RUNTESTFLAGS: -march=armv5t arm-none-linux-gnueabihf --with-mode arm --with-cpu arm10tdmi --with-fpu vfp arm-none-linux-gnueabihf --with-mode arm --with-cpu cortex-a5 --with-fpu vfpv3-d16-fp16 armeb-none-linux-gnueabihf --with-mode arm --with-cpu cortex-a9 --with-fpu vfpv3-d16-fp16 Other arm-none-linux-gnueabihf configs I test still run fine.
GCC 8.3 has been released.
GCC 8.4.0 has been released, adjusting target milestone.
The GCC 8 branch is being closed, fixed in GCC 9.1.