This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

How to preserve 16-byte stack alignment for x86_64-apple-darwin12


Hi.  This is a re-post of my previous question, with corrections and
test code.  I am trying to create a dynamic library for
x86_64-apple-darwin12 with gcc 4.8.1.  My understanding is that the
x86-64 ABI requires 16 byte stack alignment.  However, the following
program looses 16-byte alignment between function calls.  This results
in a memory fault when a dynamic loader stub function is invoked on
sub2 calling sub3.

The same test runs correctly when compiled for static linking.
Tracing with GDB shows that the stack misalignment is still present,
but there is no fault because there is no call to the loader or other
ABI entry.

>From the assembly code below, it seems that the problem is at the last
instruction of the prolog of sub2.  subq $8, %rsp is correct for
8-byte alignment, but invalid for 16.  The same compilation on our
Linux system emits subq $16, %rsp, which I believe is correct.

Is there a straightforward way to preserve 16-byte stack alignment on
this target?  I tried some of the obvious controls such as
-mpreferred-stack-boundary=4, but no relief.  There are several
kludges that either hide the alignment problem, or add excess code.
Thanks for any insights.

----------------------------------
main.c:
#include <stdio.h>
extern void sub1( void );

int main() {
    (void) fprintf (stderr, "Main: Call sub1.\n");
    sub1();
    (void) fprintf (stderr, "Main: sub1 returned.\n");
    return 0;
}

----------------------------------
subs.c:
int sub3( void )
    {
    return 99;
    }

void sub2( int a )
    {
    if (a == 2)
        sub3();
    }

void sub1( void )
    {
    sub2( 1 );
    sub2( 2 );
    }

----------------------------------
Compile commands for Mac:
gcc -g -O0 -dynamiclib -fPIC -fno-common -flat_namespace \
  -Wall -save-temps subs.c -o subs.dylib
gcc -g -Wall main.c subs.dylib

There is additional info on "hidden" compile options found in the
debug section of the assembly temp file:
.ascii "GNU C 4.8.1 -fpreprocessed -feliminate-unused-debug-symbols
-mmacosx-version-min=10.8.4 -mtune=core2 -g -O0 -fPIC -fno-common\0"

----------------------------------
Compiler:
mac56:~/bugs/gcc/stack-align 241> gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/local/libexec/gcc/x86_64-apple-darwin12/4.8.1/lto-wrapper
Target: x86_64-apple-darwin12
Configured with:
/opt/local/var/macports/build/_opt_mports_dports_lang_gcc48/gcc48/work/gcc-4.8.1/configure
--prefix=/opt/local --build=x86_64-apple-darwin12
--enable-languages=c,c++,objc,obj-c++,lto,fortran,java
--libdir=/opt/local/lib/gcc48 --includedir=/opt/local/include/gcc48
--infodir=/opt/local/share/info --mandir=/opt/local/share/man
--datarootdir=/opt/local/share/gcc-4.8 --with-local-prefix=/opt/local
--with-system-zlib --disable-nls --program-suffix=-mp-4.8
--with-gxx-include-dir=/opt/local/include/gcc48/c++/
--with-gmp=/opt/local --with-mpfr=/opt/local --with-mpc=/opt/local
--with-cloog=/opt/local --enable-cloog-backend=isl
--disable-cloog-version-check --enable-stage1-checking
--disable-multilib --enable-lto --enable-libstdcxx-time
--with-as=/opt/local/bin/as --with-ld=/opt/local/bin/ld
--with-ar=/opt/local/bin/ar
--with-bugurl=https://trac.macports.org/newticket
--with-pkgversion='MacPorts gcc48 4.8.1_3'
Thread model: posix
gcc version 4.8.1 (MacPorts gcc48 4.8.1_3)

----------------------------------
Platform:
uname -a:
Darwin mac56 12.4.0 Darwin Kernel Version 12.4.0: Wed May  1 17:57:12
PDT 2013; root:xnu-2050.24.15~1/RELEASE_X86_64 x86_64

Mac OS version 10.8.4

Hardware config:
  Model Name: Mac Pro
  Model Identifier: MacPro4,1
  Processor Name: Quad-Core Intel Xeon
  Processor Speed: 2.66 GHz
  Number of Processors: 1
  Total Number of Cores: 4
  L2 Cache (per Core): 256 KB
  L3 Cache: 8 MB
  Memory: 8 GB
  Processor Interconnect Speed: 4.8 GT/s
  Boot ROM Version: MP41.0081.B07
  SMC Version (system): 1.39f5
  SMC Version (processor tray): 1.39f5
  Serial Number (system): xxxxxxxxxx
  Serial Number (processor tray): xxxxxxxxxx
  Hardware UUID: xxxxxxxxxx

----------------------------------
Assembly code emitted by gcc (subs.s):
        .text
Ltext0:
        .globl _sub3
_sub3:
LFB0:
LM1:
        pushq   %rbp
LCFI0:
        movq    %rsp, %rbp
LCFI1:
LM2:
        movl    $99, %eax
LM3:
        popq    %rbp
LCFI2:
        ret
LFE0:
        .globl _sub2
_sub2:
LFB1:
LM4:
        pushq   %rbp
LCFI3:
        movq    %rsp, %rbp
LCFI4:
        subq    $8, %rsp
        movl    %edi, -4(%rbp)
LM5:
        cmpl    $2, -4(%rbp)
        jne     L3
LM6:
        call    _sub3
L3:
LM7:
        leave
LCFI5:
        ret
LFE1:
        .globl _sub1
_sub1:
LFB2:
LM8:
        pushq   %rbp
LCFI6:
        movq    %rsp, %rbp
LCFI7:
LM9:
        movl    $1, %edi
        call    _sub2
LM10:
        movl    $2, %edi
        call    _sub2
LM11:
        popq    %rbp
LCFI8:
        ret
LFE2:

        .section __DWARF,__debug_frame,regular,debug
(Omitted the rest of this info section, here to EOF)

----------------------------------
Normal output:
./a.out
Main: Call sub1.
Segmentation fault

----------------------------------
Selected trace output from GDB:
Full trace available on request.
GNU gdb 6.3.50-20050815 (Apple version gdb-1824) (Thu Nov 15 10:42:43 UTC 2012)

Call steps from main to sub1, then sub1 to sub2.  Note that the
dynamic loader is invoked on each of these calls, and the full (very
long) trace is hidden by GDB in "step" mode.  Note that 16-byte stack
alignment is good within sub1, broken within sub2.
(gdb) step
Main: Call sub1.
6    sub1();
3: $rsp = (void *) 0x7fff5fbfe730
2: $rbp = (void *) 0x7fff5fbfe730
1: x/i $pc  0x100000f29 <main+39>: callq  0x100000f58 <dyld_stub_sub1>
(gdb)
sub1 () at subs.c:14
14    sub2( 1 );
3: $rsp = (void *) 0x7fff5fbfe720
2: $rbp = (void *) 0x7fff5fbfe720
1: x/i $pc  0x100003efa <sub1+4>: mov    $0x1,%edi
(gdb)
sub2 (a=1) at subs.c:8
8    if (a == 2)
3: $rsp = (void *) 0x7fff5fbfe708
2: $rbp = (void *) 0x7fff5fbfe710
1: x/i $pc  0x100003ee9 <sub2+11>: cmpl   $0x2,-0x4(%rbp)
(gdb)

Single step the SECOND call from sub1 to sub2.
This is to avoid single stepping through the tedious loader process,
which was resolved on the first call.
(gdb)
sub1 () at subs.c:15
15    sub2( 2 );
3: $rsp = (void *) 0x7fff5fbfe720
2: $rbp = (void *) 0x7fff5fbfe720
1: x/i $pc  0x100003f04 <sub1+14>: mov    $0x2,%edi
(gdb) stepi
0x0000000100003f09 15    sub2( 2 );
3: $rsp = (void *) 0x7fff5fbfe720
2: $rbp = (void *) 0x7fff5fbfe720
1: x/i $pc  0x100003f09 <sub1+19>: callq  0x100003f10 <dyld_stub_sub2>
(gdb)
0x0000000100003f10 in dyld_stub_sub2 ()
3: $rsp = (void *) 0x7fff5fbfe718
2: $rbp = (void *) 0x7fff5fbfe720
1: x/i $pc  0x100003f10 <dyld_stub_sub2>: jmpq   *0xfa(%rip)        #
0x100004010
(gdb)
sub2 (a=1) at subs.c:7
7    {
3: $rsp = (void *) 0x7fff5fbfe718
2: $rbp = (void *) 0x7fff5fbfe720
1: x/i $pc  0x100003ede <sub2>: push   %rbp
(gdb)
0x0000000100003edf 7    {
3: $rsp = (void *) 0x7fff5fbfe710
2: $rbp = (void *) 0x7fff5fbfe720
1: x/i $pc  0x100003edf <sub2+1>: mov    %rsp,%rbp
(gdb)
0x0000000100003ee2 7    {
3: $rsp = (void *) 0x7fff5fbfe710
2: $rbp = (void *) 0x7fff5fbfe710
1: x/i $pc  0x100003ee2 <sub2+4>: sub    $0x8,%rsp
(gdb)
0x0000000100003ee6 7    {
3: $rsp = (void *) 0x7fff5fbfe708
2: $rbp = (void *) 0x7fff5fbfe710
1: x/i $pc  0x100003ee6 <sub2+8>: mov    %edi,-0x4(%rbp)
(gdb)
8    if (a == 2)
3: $rsp = (void *) 0x7fff5fbfe708
2: $rbp = (void *) 0x7fff5fbfe710
1: x/i $pc  0x100003ee9 <sub2+11>: cmpl   $0x2,-0x4(%rbp)
(gdb)
0x0000000100003eed 8    if (a == 2)
3: $rsp = (void *) 0x7fff5fbfe708
2: $rbp = (void *) 0x7fff5fbfe710
1: x/i $pc  0x100003eed <sub2+15>: jne    0x100003ef4 <sub2+22>
(gdb)
9        sub3();
3: $rsp = (void *) 0x7fff5fbfe708
2: $rbp = (void *) 0x7fff5fbfe710
1: x/i $pc  0x100003eef <sub2+17>: callq  0x100003f16 <dyld_stub_sub3>
(gdb)
0x0000000100003f16 in dyld_stub_sub3 ()
3: $rsp = (void *) 0x7fff5fbfe700
2: $rbp = (void *) 0x7fff5fbfe710
1: x/i $pc  0x100003f16 <dyld_stub_sub3>: jmpq   *0xfc(%rip)        #
0x100004018
(gdb)

Now we are going deeper into the dynamic loader stub function for
sub3.  This is ABI territory, I think.  The ABI required alignment was
already violated at callq 0x100003f16 just above.  A dozen or so
instructions later, this is the conclusion:
(gdb)
0x00007fff85ac68a0 in dyld_stub_binder ()
3: $rsp = (void *) 0x7fff5fbfe628
2: $rbp = (void *) 0x7fff5fbfe6e8
1: x/i $pc  0x7fff85ac68a0 <dyld_stub_binder+40>: mov    %rax,0x30(%rsp)
(gdb)
0x00007fff85ac68a5 in misaligned_stack_error_entering_dyld_stub_binder ()
3: $rsp = (void *) 0x7fff5fbfe628
2: $rbp = (void *) 0x7fff5fbfe6e8
1: x/i $pc  0x7fff85ac68a5
<misaligned_stack_error_entering_dyld_stub_binder>: movdqa
%xmm0,0x40(%rsp)
(gdb)

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 13 at address: 0x0000000000000000

--Dave


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]