The tests contain asm-listing like this: __asm ( testl %0, %0 jnz 1f .subsection 1 .type _L_mutex_lock_%=, @function _L_mutex_lock_%=: 1: leaq %1, %%rdi 2: subq $128, %%rsp 3: call bar 4: addq $128, %%rsp 5: jmp 21f ... As _L_mutex_lock is a function, GCC generates a prologue and epilogue for it - in prologue stack alignment is performed (according to ABI64, stack should be aligned to 128-bit). Before a call, SP is assumed to be a multiple of 16, at function entry, when return address is pushed to stack, (SP+8) becomes multiple of 16 - GCC uses these assumptions when generating prologue for stack alignment. But if JUMP is used instead of CALL, 8-byte displacement doesn't take place, and stack becomes unaligned - that's violation of ABI. To fix this error, JUMP should be replaced with CALL, or L_mutex_lock shouldn't be declared as a function.
(In reply to comment #0) > > As _L_mutex_lock is a function, GCC generates a prologue and epilogue for it - > in prologue stack alignment is performed (according to ABI64, stack should be > aligned to 128-bit). I didn't see any prologue and epilogue for _L_mutex_lock. Do you have a run-time testcase to show the problem?
(In reply to comment #1) > (In reply to comment #0) > > > > As _L_mutex_lock is a function, GCC generates a prologue and epilogue for it - > > in prologue stack alignment is performed (according to ABI64, stack should be > > aligned to 128-bit). > > I didn't see any prologue and epilogue for _L_mutex_lock. Do you have > a run-time testcase to show the problem? I don't have run-time test for this fail, but here is a way to see the problem: Firstly, you need to compile the test (it is in gcc/testsuite/gcc.target/i386): gcc cleanup-1.c -fexceptions -fnon-call-exceptions -fasynchronous-unwind-tables -O0 -lm -g -fno-inline Then, in debugger one can see that before each callq before foo (here 'before' means 'upper in call stack') RSP contains aligned value, after foo (deeper in call stack) - unaligned. Here is corresponding a assembler dump: -Dump of assembler code for function foo: 0x0000000000400886 <+0>: push %rbp 0x0000000000400887 <+1>: mov %rsp,%rbp 0x000000000040088a <+4>: push %r12 0x000000000040088c <+6>: push %rbx 0x000000000040088d <+7>: sub $0x98,%rsp # It is the prologue, 0x0000000000400894 <+14>: mov %edi,-0x114(%rbp) # that I've spoken 0x000000000040089a <+20>: mov -0x114(%rbp),%ebx # about. 0x00000000004008a0 <+26>: lea -0x110(%rbp),%r12 # 0x00000000004008a7 <+33>: test %ebx,%ebx 0x00000000004008a9 <+35>: jne 0x400941 <_L_mutex_lock_216> 0x00000000004008af <+41>: add $0x98,%rsp 0x00000000004008b6 <+48>: pop %rbx 0x00000000004008b7 <+49>: pop %r12 0x00000000004008b9 <+51>: leaveq 0x00000000004008ba <+52>: retq -End of assembler dump. RTL dumps showed that these instructions appear phase "197r.pro_and_epilogue". I'm not sure, if prologue generation is incorrect - in my opinion the problem is with such untrivial call of _L_mutex_lock.
(In reply to comment #2) > (In reply to comment #1) > > (In reply to comment #0) > > > > > > As _L_mutex_lock is a function, GCC generates a prologue and epilogue for it - > > > in prologue stack alignment is performed (according to ABI64, stack should be > > > aligned to 128-bit). > > > > I didn't see any prologue and epilogue for _L_mutex_lock. Do you have > > a run-time testcase to show the problem? > > I don't have run-time test for this fail, but here is a way to see the problem: > It is very easy to check if stack alignment is correct at run-time. Please see how it is done in testcases under gcc.dg/torture/stackalign.
Created attachment 24584 [details] Test showing that cleanup-* tests are not quite correct.
(In reply to comment #3) > (In reply to comment #2) > > (In reply to comment #1) > > > (In reply to comment #0) > > > > > > > > As _L_mutex_lock is a function, GCC generates a prologue and epilogue for it - > > > > in prologue stack alignment is performed (according to ABI64, stack should be > > > > aligned to 128-bit). > > > > > > I didn't see any prologue and epilogue for _L_mutex_lock. Do you have > > > a run-time testcase to show the problem? > > > > I don't have run-time test for this fail, but here is a way to see the problem: > > > > It is very easy to check if stack alignment is correct at run-time. > Please see how it is done in testcases under gcc.dg/torture/stackalign. Please find the testcase attached. It passes on 32 bits and fails on 64 bits (4.7.0 and 4.5.1(RedHat) GCC was checked).
(In reply to comment #4) > Created attachment 24584 [details] > Test showing that cleanup-* tests are not quite correct. Wrong test. You don't know the alignment of buf. You should put an alignment attribute on buf.
Created attachment 24587 [details] Fixed cleanup-regression.c
(In reply to comment #6) > (In reply to comment #4) > > Created attachment 24584 [details] > > Test showing that cleanup-* tests are not quite correct. > > Wrong test. You don't know the alignment of buf. > You should put an alignment attribute on buf. I added new test, with align attribute. Compiler sometimes could make stack aligned (accidentally), and the bug wouldn't show up. To watch the fail, try use following command line: gcc cleanup-regression-2.c -O0 -lm -fno-inline -fexceptions -fnon-call-exceptions -fasynchronous-unwind-tables && ./a.out
(In reply to comment #0) > The tests contain asm-listing like this: > __asm ( > testl %0, %0 > jnz 1f > .subsection 1 > .type _L_mutex_lock_%=, @function > _L_mutex_lock_%=: > 1: leaq %1, %%rdi > 2: subq $128, %%rsp > 3: call bar > 4: addq $128, %%rsp > 5: jmp 21f > ... > The bug is in testcase. When _L_mutex_lock_ calls bar, it didn't align stack to 16byte. But I don't think it matters here. Jakub, do we need to fix it?
The testcase tests what glibc does, so IMHO it should stay as is. None of the functions that are called with misaligned stack actually really require it to be aligned.
(In reply to comment #10) > The testcase tests what glibc does, so IMHO it should stay as is. None of the > functions that are called with misaligned stack actually really require it to > be aligned. But gcc may generate SSE codes and assume stack is 16byte aligned, which leads to segfault.
Dup *** This bug has been marked as a duplicate of bug 70364 ***