The following testcase gets wrong unwind info with -m32 -Os -fpic -fno-asynchronous-unwind-tables on the 4.3 branch (haven't tried 4.4 yet). The problem is incorrect DW_CFA_GNU_args_size (usually off by 16), which results in catch receiving %esp 16 bytes above what should be received. As this catch is called in the loop 10 times, each iteration bumps %esp by 16 bytes and eventually trashes saved variables on the stack (in http://bugzilla.redhat.com/447912 case already the 3rd iteration overwrites saved stderr pointer on the stack). The interesting function is _ZN17OleEmbeddedObject44TryToRetrieveCachedVisualRepresentation_ImplERKN3com3sun4star3uno9ReferenceINS2_2io7XStreamEEEh. From what I can see, DW_CFA_GNU_args_size matches the code from the beginning until first call rtl_uString_release@PLT. After the following addl $16, %esp it is still right (after that insns args_size is set to 0): .LCFI2090: call rtl_uString_release@PLT addl $16, %esp .LCFI2091: jmp .L542 ... .long .LCFI2090-.LCFI2089 .byte 0x2e # DW_CFA_GNU_args_size .uleb128 0x10 .byte 0x4 # DW_CFA_advance_loc4 .long .LCFI2091-.LCFI2090 .byte 0x2e # DW_CFA_GNU_args_size .uleb128 0x0 But the code jmp .L542 jumps to has args_size 16, not 0: .LCFI2098: call __cxa_begin_catch@PLT addl $16, %esp .LCFI2099: .LEHB94: call __cxa_end_catch@PLT .LEHE94: .L542: # basic block 17 cmpl $0, -20(%ebp) je .L547 # basic block 18 movl stderr@GOT(%ebx), %edx movl $0, -144(%ebp) movl %edx, -168(%ebp) movl %edx, -172(%ebp) movl %edx, -176(%ebp) .L565: # basic block 19 pushl %eax .LCFI2100: ... .long .LCFI2098-.LCFI2097 .byte 0x2e # DW_CFA_GNU_args_size .uleb128 0x20 .byte 0x4 # DW_CFA_advance_loc4 .long .LCFI2099-.LCFI2098 .byte 0x2e # DW_CFA_GNU_args_size .uleb128 0x10 .byte 0x4 # DW_CFA_advance_loc4 .long .LCFI2100-.LCFI2099 .byte 0x2e # DW_CFA_GNU_args_size .uleb128 0x14 Without -fasynchronous-unwind-tables it honors second operands of CALL rtls and those are correct for this codepath.
Created attachment 15712 [details] rh447912.ii.bz2
Important correction, it works with: -m32 -Os -fpic -fno-asynchronous-unwind-tables but doesn't with: -m32 -Os -fpic -fasynchronous-unwind-tables which generates identical .text, but different unwind info. The code in between that jmp .L542 and .L542: looks broken: .LCFI2090: call rtl_uString_release@PLT addl $16, %esp .LCFI2091: jmp .L542 .L543: .L621: # basic block 13 .L544: movl %eax, %edi subl $12, %esp .LCFI2092: leal -32(%ebp), %eax movl %edx, %esi pushl %eax .LCFI2093: call _ZN3com3sun4star3uno9ReferenceINS2_10XInterfaceEED1Ev@PLT jmp .L545 .LCFI2094: .L622: # basic block 14 movl %eax, %edi movl %edx, %esi .L545: # basic block 15 .L623: subl $12, %esp .LCFI2095: pushl -28(%ebp) .LCFI2096: call rtl_uString_release@PLT cmpl $1, %esi jne .L592 # basic block 16 .L546: subl $12, %esp .LCFI2097: pushl %edi .LCFI2098: call __cxa_begin_catch@PLT addl $16, %esp .LCFI2099: .LEHB94: call __cxa_end_catch@PLT .LEHE94: .L542: # basic block 17 cmpl $0, -20(%ebp) Both .L621 and .L622 are landing pads, so I believe args_size should be 0 at those points (and the dwarf2out code even clears args_size on BARRIERs). call _ZN3com3sun4star3uno9ReferenceINS2_10XInterfaceEED1Ev@PLT is done with args_size 16, which looks correct, but it then jumps to .L545 without addl 16, %esp which would be IMNSHO expected to get stack pointer back to args_size 0 level, and .L545 is after barrier with no stack changes. When entering the .L622 landing pad call rtl_uString_release@PLT is done with correct args_size 16, but the following call __cxa_begin_catch@PLT is done with args_size 32, eventhough the call only has one parameter and so should be 16.
Smaller testcase for -m32 -Os -fpic -fasynchronous-unwind-tables -fno-inline: extern "C" { struct FILE; extern FILE *stderr; extern int fprintf (FILE *, const char *, ...); struct R { int r1; unsigned short r2[1]; }; int bar1 (unsigned short *, int, short) throw (); void bar2 (R *) throw (); void bar3 (R **, const unsigned short *, int) throw (); void bar4 (R **, const char *) throw (); void bar5 (R **, R *) throw (); void bar6 (R **, R *, R *) throw (); } struct S { R *s; struct T { }; S (R *x, T *) { s = x; } ~S () { bar2 (s); } S &operator= (const S &x); S &operator+= (const S &x); S sfn1 (const S &x) const; friend S operator+ (const S &x1, const S &x2); static S sfn2 (int i) { unsigned short q[33]; R *p = 0; bar3 (&p, q, bar1 (q, i, 10)); return S (p, (T *) 0); } static S sfn3 (const char *x) { R *p = 0; bar4 (&p, x); return S (p, (T *) 0); } }; struct U { }; template <class C> inline unsigned char operator >>= (const U &, C &); struct V; struct W { V *w; unsigned char is () const; }; template <class T> struct X : public W { static T *xfn1 (V *p); inline ~X (); X (); X (const W &rRef); T *operator -> () const; }; struct E { E (); E (const S &, const X <V> &); E (E const &); ~E (); E &operator = (E const &); S e; X <V> f; }; struct V { virtual void release () throw () = 0; }; template <class T> X <T>::~X () { if (w) w->release (); } struct Y { virtual U yfn1 (const S &aName) = 0; }; struct Z; X <V> baz1 (const S &) throw (E); X <Z> baz2 (const X <Z> &xStream) throw (E); X <Z> foo () throw () { X <Z> xResult; X <Y> xNameContainer; try { xNameContainer = X <Y> (baz1 (S::sfn3 ("com.sun.star.embed.OLESimpleStorage"))); } catch (E &) { } if (xNameContainer.is ()) { for (int nInd = 0; nInd < 10; nInd++) { S aStreamName = S::sfn3 ("abcd"); aStreamName += S::sfn2 ((int) nInd); X <Z> xCachedCopyStream; try { { fprintf (stderr, "trying %d\n", nInd); } if ((xNameContainer-> yfn1 (aStreamName) >>= xCachedCopyStream)) if (xCachedCopyStream.is ()) { { fprintf (stderr, "still trying %d\n", nInd); } xResult = baz2 (xCachedCopyStream); if (xResult.is ()) break; } { fprintf (stderr, "success on %d\n", nInd); } } catch (...) { fprintf (stderr, "failure on %d\n", nInd); } } } return xResult; }
Here is a testcase that can be run, unfortunately during the simplification the incorrect behavior changed from the stack pointer increasing in each iteration by 16 bytes into decreasing in each iteration by 4 bytes. That's wrong too of course. // { dg-options "-Os -fasynchronous-unwind-tables -fpic -fno-inline" } extern "C" { struct FILE; extern FILE *stderr; extern int fprintf (FILE *, const char *, ...); struct R { int r1; unsigned short r2[1]; }; int bar1 (unsigned short *, int, short) throw (); void bar2 (R *) throw (); void bar3 (R **, const unsigned short *, int) throw (); void bar4 (R **, const char *) throw (); } struct S { R *s; struct T { }; S (R *x, T *) { s = x; } ~S () { bar2 (s); } S &operator= (const S &x); S &operator+= (const S &x); S sfn1 (const S &x) const; friend S operator+ (const S &x1, const S &x2); static S sfn2 (int i) { unsigned short q[33]; R *p = 0; bar3 (&p, q, bar1 (q, i, 10)); return S (p, (T *) 0); } static S sfn3 (const char *x) { R *p = 0; bar4 (&p, x); return S (p, (T *) 0); } }; struct U { }; template <class C> unsigned char operator >>= (const U &, C &); struct V; struct W { V *w; unsigned char is () const; }; template <class T> struct X : public W { inline ~X (); X (); X (const W &); T *operator -> () const; }; struct E { E (); E (const S &, const X <V> &); E (E const &); ~E (); E &operator = (E const &); }; struct V { virtual void release () throw (); }; template <class T> X <T>::~X () { if (w) w->release (); } struct Y { virtual U yfn1 (const S &); }; struct Z; X <V> baz1 (const S &) throw (E); X <Z> baz2 (const X <Z> &) throw (E); template <typename T> X<T>::X () { w = __null; } template <typename T> X<T>::X (W const &) { w = __null; } U Y::yfn1 (const S &) { throw 12; } Y y; template <typename T> T *X<T>::operator -> () const { return &y; } X <V> baz1 (const S &) throw (E) { return X<V> (); } E::E () { } E::~E () { } X <Z> baz2 (const X <Z> &) throw (E) { throw E (); } int bar1 (unsigned short *, int, short) throw () { asm volatile ("" : : : "memory"); return 0; } void bar2 (R *) throw () { asm volatile ("" : : : "memory"); } void bar3 (R **, const unsigned short *, int) throw () { asm volatile ("" : : : "memory"); } void bar4 (R **, const char *) throw () { asm volatile ("" : : : "memory"); } unsigned char W::is () const { return 1; } S &S::operator += (const S &) { return *this; } template <class C> unsigned char operator >>= (const U &, C &) { throw 1; } template X<Y>::X (); template X<Z>::X (); template unsigned char operator >>= (const U &, X<Z> &); template X<Y>::X (W const &); template Y *X<Y>::operator-> () const; X <Z> foo () throw () { X <Z> a; X <Y> b; try { b = X <Y> (baz1 (S::sfn3 ("defg"))); } catch (E &) { } if (b.is ()) { for (int n = 0; n < 10; n++) { S c = S::sfn3 ("abcd"); c += S::sfn2 (n); X <Z> d; try { fprintf (stderr, "trying %d\n", n); if ((b->yfn1 (c) >>= d)) if (d.is ()) { fprintf (stderr, "failure1 on %d\n", n); a = baz2 (d); if (a.is ()) break; } fprintf (stderr, "failure2 on %d\n", n); } catch (...) { void *p; asm volatile ("movl %%esp, %0" : "=r" (p)); fprintf (stderr, "caught %d %p\n", n, p); } } } return a; } int main () { foo (); return 0; }
The short testcase reproduces also on the trunk. I see at least 3 problems with -fasynchronous-unwind-tables -mno-accumulate-outgoing-args -fno-omit-frame-pointer: 1) expand_resx_expr doesn't do pending stack adjust: --- except.c.jj 2008-04-28 11:35:55.000000000 +0200 +++ except.c 2008-06-03 10:27:04.000000000 +0200 @@ -540,6 +540,7 @@ expand_resx_expr (tree exp) cfun->eh->region_array, region_nr); gcc_assert (!reg->resume); + do_pending_stack_adjust (); reg->resume = emit_jump_insn (gen_rtx_RESX (VOIDmode, region_nr)); emit_barrier (); } The problem with this change is that it is probably completely unnecessary if we end up calling Dwarf2 _Unwind_Resume, it is certainly needed when it is being changed into a jump. 2) as shown on the short testcases, CSA merges stack adjustments between a prologue stack adjustment and post-prologue stack adjustment. (insn/f 547 546 548 2 rhp.C:171 (parallel [ (set (reg/f:SI 7 sp) (plus:SI (reg/f:SI 7 sp) (const_int -76 [0xffffffffffffffb4]))) (clobber (reg:CC 17 flags)) (clobber (mem:BLK (scratch) [0 A8])) ]) -1 (nil)) ... (insn:HI 9 5 10 2 rhp.C:173 (parallel [ (set (reg/f:SI 7 sp) (plus:SI (reg/f:SI 7 sp) (const_int -12 [0xfffffffffffffff4]))) (clobber (reg:CC 17 flags)) ]) 283 {*addsi_1} (nil)) becomes after CSA: (insn/f 547 546 548 2 rhp.C:171 (parallel [ (set (reg/f:SI 7 sp) (plus:SI (reg/f:SI 7 sp) (const_int -88 [0xffffffffffffffa8]))) (clobber (reg:CC 17 flags)) (clobber (mem:BLK (scratch) [0 A8])) ]) 868 {pro_epilogue_adjust_stack_1} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) As dwarf2out_stack_adjust ignores prologue insns, without CSA it would properly set DW_CFA_GNU_args_size to 12 after that post-prologue adjustment but as it has been merged into a prologue stack adjustment, it is ignored and so DW_CFA_GNU_args_size is off-by-12 until first barrier. 3) As can be also seen on the short testcases, the assumption that after BARRIER we should reset DW_CFA_GNU_args_size back to 0 is wrong. While it holds (resp. should hold) during expansion and perhaps shortly after that, e.g. outof_cfglayout happily moves things around such that we have unconditional jumps and BARRIERs too, with different level of stack pushing than level DW_CFA_GNU_args_size 0. Any ideas about this? Do you think it is ok to just apply 1) or should we e.g. try to remove unnecessary stack adjustments before _Unwind_Resume (though, _Unwind_Resume is a call, so we probably need to guarantee correct stack alignment at least). For 2) and 3), we might just walk the whole function, noting the level of stack pushing for all labels, verify at CALLs that it matches their second operand and verify that all jumps to labels have the same level of stack pushing.
> Any ideas about this? Do you think it is ok to just apply 1) or should we > e.g. try to remove unnecessary stack adjustments before _Unwind_Resume > (though, _Unwind_Resume is a call, so we probably need to guarantee correct > stack alignment at least). Exception propagation is already slow (with DWARF-2 EH) so I don't think we should bother about performance here; applying 1) as-is seems fine to me. > For 2) and 3), we might just walk the whole function, noting the level of > stack pushing for all labels, verify at CALLs that it matches their second > operand and verify that all jumps to labels have the same level of stack > pushing. Why is 2) problematic on principle? I mean, prologue insns are translated into CFI like non-prologue stack adjustment insns so why is it a problem that one of the latters "become" one of the formers?
Ok, I'll bootstrap/regtest patch 1). To answer your question about 2), consider e.g.: __attribute__((noinline)) void foo (int, int) { static int n = 0; void *esp; asm volatile ("movl %%esp, %0" : "=r" (esp)); __builtin_printf ("foo %p\n", esp); if (n++ == 0) throw 1; } __attribute__((noinline)) void baz (int, int, int, int, int, int, int) { static int n = 1; void *esp; asm volatile ("movl %%esp, %0" : "=r" (esp)); __builtin_printf ("baz %p\n", esp); if (n++ == 0) throw 1; } struct A { A () { }; ~A (); char c[24]; }; __attribute__((noinline)) A::~A () { asm volatile ("" : : : "memory"); } __attribute__((noinline)) void bar () { A a; try { foo (1, 2); baz (3, 4, 5, 6, 7, 8, 9); } catch (...) { } foo (1, 2); baz (3, 4, 5, 6, 7, 8, 9); } int main () { bar (); return 0; } with -m32 -Os -fasynchronous-unwind-tables -mpreferred-stack-boundary=4 vs. -m32 -Os -mpreferred-stack-boundary=4. The generated .text is identical between both, yet without -fa-u-t it shows correct stack alignment (both foo calls have the same %esp, baz 16 bytes below it), but with -fa-u-t the second foo call has %esp 8 bytes below the first foo call's %esp (so clearly one of them is not aligned to the preferred stack boundary), and baz is 16 bytes below it. The problem is that at the first call _Z3fooii in bar -fa-u-t computed args_size 8, while without -fa-u-t it is 16. The first call _Z3baziiiiiii has wrong args_size too (with -fa-u-t 28, without -fa-u-t 32). The second call _Z3fooii already has the same args_size between the two (16) and second call _Z3baziiiiiii too (in both cases 32).
Subject: Bug 36419 Author: jakub Date: Fri Jun 6 13:24:45 2008 New Revision: 136435 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=136435 Log: PR rtl-optimization/36419 * except.c (expand_resx_expr): Call do_pending_stack_adjust () before the emitting jump insn. Modified: trunk/gcc/ChangeLog trunk/gcc/except.c
4.3.1 is being released, adjusting target milestone.
Subject: Bug 36419 Author: jakub Date: Fri Jun 6 19:29:28 2008 New Revision: 136496 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=136496 Log: PR rtl-optimization/36419 * except.c (expand_resx_expr): Call do_pending_stack_adjust () before the emitting jump insn. Modified: branches/gcc-4_3-branch/gcc/ChangeLog branches/gcc-4_3-branch/gcc/except.c
Jakub -- This is still marked as a 4.3/4.4 regression. But, there are patches in the audit trail. If that's correct would you please adjust the Summary field? Thanks, -- Mark
See comment #c5. There are 3 issues, 1) has been fixed, 2) has a patch posted, but so far not reviewed - http://gcc.gnu.org/ml/gcc-patches/2008-06/msg00677.html and 3) has no fix yet at all. All 3 are needed to fix the miscompiled OOo.
Closing 4.1 branch.
Subject: Bug 36419 Author: jakub Date: Thu Jul 10 07:39:54 2008 New Revision: 137689 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=137689 Log: PR rtl-optimization/36419 * combine-stack-adj.c (adjust_frame_related_expr): New function. (combine_stack_adjustments_for_block): Call it if needed. Delete correct insn. * dwarf2out.c (dwarf2out_frame_debug_expr): Adjust DW_CFA_GNU_args_size if CSA pass merged some adjustments into prologue sp adjustment. * g++.dg/eh/async-unwind1.C: New test. Added: trunk/gcc/testsuite/g++.dg/eh/async-unwind1.C Modified: trunk/gcc/ChangeLog trunk/gcc/combine-stack-adj.c trunk/gcc/dwarf2out.c trunk/gcc/testsuite/ChangeLog
Subject: Bug 36419 Author: jakub Date: Thu Jul 10 07:52:36 2008 New Revision: 137690 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=137690 Log: PR rtl-optimization/36419 * combine-stack-adj.c (adjust_frame_related_expr): New function. (combine_stack_adjustments_for_block): Call it if needed. Delete correct insn. * dwarf2out.c (dwarf2out_frame_debug_expr): Adjust DW_CFA_GNU_args_size if CSA pass merged some adjustments into prologue sp adjustment. * g++.dg/eh/async-unwind1.C: New test. Added: branches/gcc-4_3-branch/gcc/testsuite/g++.dg/eh/async-unwind1.C Modified: branches/gcc-4_3-branch/gcc/ChangeLog branches/gcc-4_3-branch/gcc/combine-stack-adj.c branches/gcc-4_3-branch/gcc/dwarf2out.c branches/gcc-4_3-branch/gcc/testsuite/ChangeLog
1) and 2) fixed now, 3) still unfixed.
> 1) and 2) fixed now, 3) still unfixed. I presume that 3) is not just a problem with the CFIs generated in final.c, rather a problem in the code itself, right?
Created attachment 15906 [details] gcc44-pr36419.patch Patch I'm going to bootstrap/regtest now. It fixes the testcase. And to answer the last question, no, there is no code generation problem, only the unwind info (args_size in it) has been wrong.
Subject: Bug 36419 Author: jakub Date: Thu Jul 31 18:08:36 2008 New Revision: 138427 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=138427 Log: PR rtl-optimization/36419 * dwarf2out.c (barrier_args_size): New variable. (compute_barrier_args_size, compute_barrier_args_size_1): New functions. (dwarf2out_stack_adjust): For BARRIERs call compute_barrier_args_size if not called yet in the current function, use barrier_args_size array to find the new args_size value. (dwarf2out_frame_debug): Free and clear barrier_args_size. * g++.dg/eh/async-unwind2.C: New test. Added: trunk/gcc/testsuite/g++.dg/eh/async-unwind2.C Modified: trunk/gcc/ChangeLog trunk/gcc/dwarf2out.c trunk/gcc/testsuite/ChangeLog
Subject: Bug 36419 Author: jakub Date: Thu Jul 31 19:07:51 2008 New Revision: 138431 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=138431 Log: PR rtl-optimization/36419 * dwarf2out.c (barrier_args_size): New variable. (compute_barrier_args_size, compute_barrier_args_size_1): New functions. (dwarf2out_stack_adjust): For BARRIERs call compute_barrier_args_size if not called yet in the current function, use barrier_args_size array to find the new args_size value. (dwarf2out_frame_debug): Free and clear barrier_args_size. * g++.dg/eh/async-unwind2.C: New test. Added: branches/gcc-4_3-branch/gcc/testsuite/g++.dg/eh/async-unwind2.C Modified: branches/gcc-4_3-branch/gcc/ChangeLog branches/gcc-4_3-branch/gcc/dwarf2out.c branches/gcc-4_3-branch/gcc/testsuite/ChangeLog
Fixed.