In the Linux kernel, we do not use SSE or floating-point of any kind (except in very particular highly controlled places); furthermore, stack space is at an extreme premium. As such, it makes absolutely no sense to align the stack to 16 bytes in x86-64. gcc, however, refuses to allow setting the stack alignment to 8 bytes: : anacreon 105 ; gcc -mpreferred-stack-boundary=3 -mincoming-stack-boundary=3 -O2 -mno-sse -c /tmp/foo.c /tmp/foo.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12 /tmp/foo.c:1:0: error: -mincoming-stack-boundary=3 is not between 4 and 12 This is particularly ironic since we have found out that, in fact, ALL kernel stacks are actually 16-byte misaligned on x86-64 as the entry code pushes a total of 88 bytes onto the stack. As such, gcc trying to keep a 16-byte alignment actually makes the code do worse, not better.
I think we can relax it under -mcmodel=kernel.
Why would -mcmodel=kernel matter for this? (For the record, there is C code in the Linux kernel which doesn't use -mcmodel=kernel, too, and other embedded programs may very well have the same issue.)
Following the x86-64 psABI, GCC assumes incoming stack is 16byte aligned and generates 16-byte aligned vector move on stack. If it isn't true, you will get segfault at run-time.
Only if the program in question is actually using SSE. If SSE is disallowed (because it is kernel code, or some other embedded piece of code) it is irrelevant.
The point is that the key is -mno-sse, not -mcmodel=kernel.
The reason why -mpreferred-stack-boundary=3 is not allowed is that the va-args sequences generated are not safe in the case frame is misaligned and values with higher intended alignment are passed. In that case the caller would be required to do dynamic stack alignment. Otherwise the value might end up misaligned on the stack and va-arg sequence will do alignment using & operation resulting in fetching wrong value from memory. Since we have dynamic stack alignment now, we could allow that and explicitely document that as ABI breaking (like we have the double alignment and other kludges on x86) Honza
We can't use the SSE parts of the ABI anyway in the kernel, and I sure hope that -mno-sse (or perhaps -mcmodel=kernel, but that would be ugly) prevents those from being generated. Yes, it's an ABI violation, but it's a very necessary one...
__int128 do not require SSE and yet it is 128bit aligned.... I am not against allowing smaller alignments, we just need to document it breaks ABI and it would be nice to explain how (and probably warn on 128bit aligned varargs or force the dynamc alignment)
With this patch: diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index eca542c..3e4e768 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -3660,9 +3660,14 @@ ix86_option_override_internal (bool main_args_p) ix86_preferred_stack_boundary = PREFERRED_STACK_BOUNDARY_DEFAULT; if (global_options_set.x_ix86_preferred_stack_boundary_arg) { - int min = (TARGET_64BIT ? 4 : 2); + int min; int max = (TARGET_SEH ? 4 : 12); + if (TARGET_64BIT) + min = TARGET_SSE ? 4 : 3; + else + min = 2; + if (ix86_preferred_stack_boundary_arg < min || ix86_preferred_stack_boundary_arg > max) { diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index ddb3645..f7f13d2 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -708,7 +708,7 @@ enum target_cpu_default #define MAIN_STACK_BOUNDARY (TARGET_64BIT ? 128 : 32) /* Minimum stack boundary. */ -#define MIN_STACK_BOUNDARY (TARGET_64BIT ? 128 : 32) +#define MIN_STACK_BOUNDARY (TARGET_64BIT ? (TARGET_SSE ? 128 : 64) : 32) /* Boundary (in *bits*) on which the stack pointer prefers to be aligned; the compiler cannot rely on having this alignment. */ I got [hjl@gnu-mic-2 gcc]$ cat /tmp/x.c extern __int128 x; extern void bar (int, int, int, int, int, __int128); void foo (void) { bar (1, 2, 3, 4, 5, x); } [hjl@gnu-mic-2 gcc]$ ./xgcc -B./ -S -O2 /tmp/x.c -mpreferred-stack-boundary=3 -mno-sse [hjl@gnu-mic-2 gcc]$ cat x.s .file "x.c" .text .p2align 4,,15 .globl foo .type foo, @function foo: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movl $5, %r8d movl $4, %ecx movl $2, %esi movl $1, %edi movq %rsp, %rbp .cfi_def_cfa_register 6 andq $-16, %rsp subq $16, %rsp movq x+8(%rip), %rdx movq x(%rip), %rax movq %rdx, 8(%rsp) movq %rax, (%rsp) movl $3, %edx call bar leave .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE0: .size foo, .-foo .ident "GCC: (GNU) 4.8.0 20120519 (experimental)" .section .note.GNU-stack,"",@progbits [hjl@gnu-mic-2 gcc]$ It looks OK to me.
The problem is va_args doing alignment based on stack pointer, i.e. in: int test (int a, ...) { va_list p; va_start (p, a); va_arg (p, int); va_arg (p, int); va_arg (p, int); va_arg (p, int); va_arg (p, int); va_arg (p, int); va_arg (p, int); va_arg (p, int); va_arg (p, int); va_arg (p, int); if (a) va_arg (p, int); return va_arg (p, __int128); } test: .LFB0: .cfi_startproc leaq 8(%rsp), %rax movq %rdx, -40(%rsp) movq %rsi, -48(%rsp) movq %rcx, -32(%rsp) movq %r8, -24(%rsp) movq %rax, -72(%rsp) leaq -56(%rsp), %rax addq $8, -72(%rsp) testl %edi, %edi movq %r9, -16(%rsp) movl $48, -80(%rsp) movq %rax, -64(%rsp) movq -72(%rsp), %rax leaq 32(%rax), %rdx movq %rdx, -72(%rsp) je .L14 addq $40, %rax movq %rax, -72(%rsp) .L14: movq -72(%rsp), %rax addq $15, %rax andq $-16, %rax leaq 16(%rax), %rdx movq %rdx, -72(%rsp) movq (%rax), %rax ret .cfi_endproc .LFE0: this will get out of sync with hard coded offsets if rsp hapepns to be misaligned.
(In reply to comment #10) > The problem is va_args doing alignment based on stack pointer, i.e. in: > int > > return va_arg (p, __int128); > } > > addq $15, %rax > andq $-16, %rax This isn't necessary. If __int128 is put on stack by caller, the stack must be aligned at 16 bytes. > leaq 16(%rax), %rdx > movq %rdx, -72(%rsp) > movq (%rax), %rax > ret > .cfi_endproc > .LFE0: > > this will get out of sync with hard coded offsets if rsp hapepns to be > misaligned. I don't think it will happen.
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383 > > --- Comment #11 from H.J. Lu <hjl.tools at gmail dot com> 2012-05-20 02:04:40 UTC --- > (In reply to comment #10) > > The problem is va_args doing alignment based on stack pointer, i.e. in: > > int > > > > return va_arg (p, __int128); > > } > > > > > addq $15, %rax > > andq $-16, %rax > > This isn't necessary. If __int128 is put on stack by caller, > the stack must be aligned at 16 bytes. Not when you call function with -fpreferred-stack-boundary=3 and it is itself compiled with -fpreferred-stack-boudnary=4 and calls another functions passing __int128. Thus the ABI incompatibility. Honza
> > This isn't necessary. If __int128 is put on stack by caller, > > the stack must be aligned at 16 bytes. > > Not when you call function with -fpreferred-stack-boundary=3 > and it is itself compiled with -fpreferred-stack-boudnary=4 > and calls another functions passing __int128. > Thus the ABI incompatibility. ... and the alignment code is there not to align stack frame, but to align the __int128bit position within the argument area. The va_arg calls beforehand are of unknon count. > Honza
(In reply to comment #12) > > Not when you call function with -fpreferred-stack-boundary=3 > and it is itself compiled with -fpreferred-stack-boudnary=4 > and calls another functions passing __int128. > Thus the ABI incompatibility. When everything is compiled with -fpreferred-stack-boundary=3, it works fine.
A patch is posted at http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01334.html
Author: hjl Date: Fri Jun 22 17:10:58 2012 New Revision: 188893 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188893 Log: Allow -mpreferred-stack-boundary=3 on x86-64 PR target/53383 * doc/invoke.texi: Add a warning for -mpreferred-stack-boundary=3. * config/i386/i386.c (ix86_option_override_internal): Allow -mpreferred-stack-boundary=3 for 64-bit if SSE is disabled. * config/i386/i386.h (MIN_STACK_BOUNDARY): Set to 64 for 64-bit if SSE is disabled. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.h trunk/gcc/doc/invoke.texi
Fixed for 4.8.0.
*** Bug 56464 has been marked as a duplicate of this bug. ***
I don't think the fix is correct. This works: gcc -mno-sse -mpreferred-stack-boundary=3 ... This does not: gcc -mno-sse -mpreferred-stack-boundary=3 -mincoming-stack-boundary=3 ... This makes no sense, since they should be equivalent. Also, I find the docs to be unclear as to what different values of the incoming and preferred stack boundaries mean. Finally, why is -mno-sse required in order to set a low stack boundary? Couldn't gcc figure out that the existence of a stack variable (SSE, alignas, __attribute__((aligned(32))), etc) should force dynamic stack alignment? In fact, that should be necessary, and already appears to work, for __attribute__((aligned(32))), as gcc generates correct code for this: typedef int __attribute__((aligned(32))) int_aligned_32; extern void func2(int_aligned_32 *x); void func(void) { int_aligned_32 x; func2(&x); }
(In reply to Andy Lutomirski from comment #19) > I don't think the fix is correct. > > This works: > > gcc -mno-sse -mpreferred-stack-boundary=3 ... > > This does not: > > gcc -mno-sse -mpreferred-stack-boundary=3 -mincoming-stack-boundary=3 ... > Please provide a testcase. > This makes no sense, since they should be equivalent. > > Also, I find the docs to be unclear as to what different values of the > incoming and preferred stack boundaries mean. > > Finally, why is -mno-sse required in order to set a low stack boundary? > Couldn't gcc figure out that the existence of a stack variable (SSE, > alignas, __attribute__((aligned(32))), etc) should force dynamic stack > alignment? Since the x86-86 psABI says that stack must be 16 byte aligned, if the stack isn't 16-byte aligned, the code with SSE insn, which follows the psABI, will crash when called with 8-byte aligned stack.
(In reply to H.J. Lu from comment #20) > (In reply to Andy Lutomirski from comment #19) > > I don't think the fix is correct. > > > > This works: > > > > gcc -mno-sse -mpreferred-stack-boundary=3 ... > > > > This does not: > > > > gcc -mno-sse -mpreferred-stack-boundary=3 -mincoming-stack-boundary=3 ... > > > > Please provide a testcase. No code needed: $ touch foo.c $ gcc -c -mno-sse -mpreferred-stack-boundary=3 -mincoming-stack-boundary=3 foo.c foo.c:1:0: error: -mincoming-stack-boundary=3 is not between 4 and 12 $ gcc -c -mno-sse -mpreferred-stack-boundary=3 foo.c > > > This makes no sense, since they should be equivalent. > > > > Also, I find the docs to be unclear as to what different values of the > > incoming and preferred stack boundaries mean. > > > > Finally, why is -mno-sse required in order to set a low stack boundary? > > Couldn't gcc figure out that the existence of a stack variable (SSE, > > alignas, __attribute__((aligned(32))), etc) should force dynamic stack > > alignment? > > Since the x86-86 psABI says that stack must be 16 byte aligned, if the stack > isn't 16-byte aligned, the code with SSE insn, which follows the psABI, > will crash when called with 8-byte aligned stack. I'm confused here. I agree in principle, but I don't actually think that gcc works this way, or, if it does, it shouldn't. If I compile with -mpreferred-stack-boundary=3 and create an aligned(32) local variable, then gcc will dynamically align the stack and the variable will have correct alignment even if the incoming stack was not 16-byte aligned. Shouldn't an SSE variable work exactly the same way? That is, if gcc is generating an SSE instruction with a memory reference to an on-stack variable that requires 16-byte alignment (movdqa, for example), wouldn't that variable be effectively aligned(16) or greater and thus trigger dynamic stack alignment. Sure, the generated SSE code will be less efficient with -mpreferred-stack-boundary=3 (because neither "and $-16,%rsp" nor the required frame pointer is free), but it should still work, right?
(In reply to Andy Lutomirski from comment #21) > $ touch foo.c > $ gcc -c -mno-sse -mpreferred-stack-boundary=3 -mincoming-stack-boundary=3 > foo.c > foo.c:1:0: error: -mincoming-stack-boundary=3 is not between 4 and 12 > $ gcc -c -mno-sse -mpreferred-stack-boundary=3 foo.c I will fix it. > > > > > This makes no sense, since they should be equivalent. > > > > > > Also, I find the docs to be unclear as to what different values of the > > > incoming and preferred stack boundaries mean. > > > > > > Finally, why is -mno-sse required in order to set a low stack boundary? > > > Couldn't gcc figure out that the existence of a stack variable (SSE, > > > alignas, __attribute__((aligned(32))), etc) should force dynamic stack > > > alignment? > > > > Since the x86-86 psABI says that stack must be 16 byte aligned, if the stack > > isn't 16-byte aligned, the code with SSE insn, which follows the psABI, > > will crash when called with 8-byte aligned stack. > > I'm confused here. I agree in principle, but I don't actually think that > gcc works this way, or, if it does, it shouldn't. > > If I compile with -mpreferred-stack-boundary=3 and create an aligned(32) > local variable, then gcc will dynamically align the stack and the variable > will have correct alignment even if the incoming stack was not 16-byte > aligned. That is correct. > Sure, the generated SSE code will be less efficient with > -mpreferred-stack-boundary=3 (because neither "and $-16,%rsp" nor the > required frame pointer is free), but it should still work, right? It works only if ALL codes are compiled with -mpreferred-stack-boundary=3.
Author: hjl Date: Mon Jul 6 11:50:47 2015 New Revision: 225452 URL: https://gcc.gnu.org/viewcvs?rev=225452&root=gcc&view=rev Log: Allow -mincoming-stack-boundary=3 with -mno-sse Similar to -mpreferred-stack-boundary=3, -mincoming-stack-boundary=3 is allowed with -mno-sse in 64-bit mode. gcc/ PR target/53383 * config/i386/i386.c (ix86_option_override_internal): Allow -mincoming-stack-boundary=3 for 64-bit if SSE is disabled. gcc/testsuite/ PR target/53383 * gcc.target/i386/pr53383-1.c: New file. * gcc.target/i386/pr53383-2.c: Likewise. * gcc.target/i386/pr53383-3.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/pr53383-1.c trunk/gcc/testsuite/gcc.target/i386/pr53383-2.c trunk/gcc/testsuite/gcc.target/i386/pr53383-3.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/testsuite/ChangeLog
I think this bug should be reopened because the current behavior is restricting. Andy is right. The compiler should be a tool and the developer using it should be allowed to shoot himself in the foot if that's what he wants. In this particular case it restricts freedom and the compiler should be the developer's tool not the other way around. Simply remove the TARGET_SSE check and let the actual developer decide if he wants the stack realigned instead of the compiler? There's even an attribute for that (to force stack realignment). I don't understand why such a check is even there, it servers NO purpose except to imply that the compiler knows better than the person using it, which is just wrong -- if someone wants to use SSE *and* 8-byte aligned stack, then let him do it. This will simply force GCC to realign the stack if and *only if* it needs to use an aligned SSE variable on the stack, just as it does with AVX already. So if 99% of your functions *do not* use SSE, they won't use 16-byte alignment. For that odd function that does use aligned SSE (or AVX), it will simply realign the stack to 16 or 32-bytes in that function's prolog. Nothing special and it won't crash. (as long as you don't call into library functions obviously) Just let the developer decide please, without having to recompile a custom modified GCC to remove that check that screams "I'm the compiler and I know better than you do what your code needs"...
(In reply to Katsunori Kumatani from comment #24) > I think this bug should be reopened because the current behavior is > restricting. [...] I agree. Let's remove this artificial dependency on -msse. Patch in testing: --cut here-- Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 246531) +++ config/i386/i386.c (working copy) @@ -5927,9 +5927,8 @@ ix86_option_override_internal (bool main_args_p, ix86_preferred_stack_boundary = PREFERRED_STACK_BOUNDARY_DEFAULT; if (opts_set->x_ix86_preferred_stack_boundary_arg) { - int min = (TARGET_64BIT_P (opts->x_ix86_isa_flags) - ? (TARGET_SSE_P (opts->x_ix86_isa_flags) ? 4 : 3) : 2); - int max = (TARGET_SEH ? 4 : 12); + int min = TARGET_64BIT_P (opts->x_ix86_isa_flags)? 3 : 2; + int max = TARGET_SEH ? 4 : 12; if (opts->x_ix86_preferred_stack_boundary_arg < min || opts->x_ix86_preferred_stack_boundary_arg > max) --cut here--
Author: uros Date: Tue Mar 28 16:51:00 2017 New Revision: 246543 URL: https://gcc.gnu.org/viewcvs?rev=246543&root=gcc&view=rev Log: PR target/53383 * config/i386/i386.c (ix86_option_override_internal): Always allow -mincoming-stack-boundary=3 for 64-bit targets. testsuite/ChangeLog: PR target/53383 * gcc.target/i386/pr53383-1.c (dg-options): Remove -mno-sse. * gcc.target/i386/pr53383-2.c (dg-options): Ditto. * gcc.target/i386/pr53383-3.c (dg-options): Ditto. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/pr53383-1.c trunk/gcc/testsuite/gcc.target/i386/pr53383-2.c trunk/gcc/testsuite/gcc.target/i386/pr53383-3.c