Bug 81646 - i386 SSE2 compilation mode which preserves psABI stack alignment without requiring it
Summary: i386 SSE2 compilation mode which preserves psABI stack alignment without requ...
Status: RESOLVED DUPLICATE of bug 40838
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 8.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-08-01 13:36 UTC by Florian Weimer
Modified: 2017-08-01 18:11 UTC (History)
1 user (show)

See Also:
Host:
Target: i386
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Florian Weimer 2017-08-01 13:36:17 UTC
It would be helpful to have an i386 compilation mode which preserves the 16-byte stack alignment (if it is aligned), supports SSE2, but does not require stack alignment.

Currently, it is almost impossible to enable SSE2 for system libraries because too much code is compiled with four byte stack alignment (as suggested in the GCC manual, among other places).  Having a stack-conservative SSE2 mode would change that.
Comment 1 Jakub Jelinek 2017-08-01 13:54:20 UTC
The Linux ABI says the stack should be 16-byte alignment, anything else is a bug.

That said, one can use -mpreferred-stack-boundary=, -mincoming-stack-boundary= and/or -mstackrealign options to tune stuff as desired.
Comment 2 Richard Biener 2017-08-01 14:02:50 UTC
Yes, I think everything asked for is already present via those options (just no way to configure a different default).

Thus either INVALID or WORKSFORME.  Pick ;)
Comment 3 Florian Weimer 2017-08-01 15:39:15 UTC
(In reply to Jakub Jelinek from comment #1)
> The Linux ABI says the stack should be 16-byte alignment, anything else is a
> bug.

The GCC manual recommends this (under -mincoming-stack-boundary):

     This extra alignment does consume extra stack space, and generally
     increases code size.  Code that is sensitive to stack space usage,
     such as embedded systems and operating system kernels, may want to
     reduce the preferred alignment to '-mpreferred-stack-boundary=2'.

It doesn't note the ABI impact, so the sorry situation is part our fault.

> That said, one can use -mpreferred-stack-boundary=,
> -mincoming-stack-boundary= and/or -mstackrealign options to tune stuff as
> desired.

Based on the gcc-help discussion,

  https://gcc.gnu.org/ml/gcc-help/2017-07/msg00087.html

no combination of these options work.  Stack alignment on every function entry is much too expensive.
Comment 4 H.J. Lu 2017-08-01 17:16:15 UTC
You can use -mstackrealign.

*** This bug has been marked as a duplicate of bug 40838 ***
Comment 5 Florian Weimer 2017-08-01 17:45:05 UTC
(In reply to H.J. Lu from comment #4)
> You can use -mstackrealign.

I don't want to realign the stack unconditionally for performance reasons.  I want to preserve alignment for callback functions, and give GCC the option to use SSE2 where beneficial.  If that's not possible, so be it, considering that it's only i386.
Comment 6 H.J. Lu 2017-08-01 18:11:06 UTC
(In reply to Florian Weimer from comment #5)
> (In reply to H.J. Lu from comment #4)
> > You can use -mstackrealign.
> 
> I don't want to realign the stack unconditionally for performance reasons. 
> I want to preserve alignment for callback functions, and give GCC the option
> to use SSE2 where beneficial.  If that's not possible, so be it, considering
> that it's only i386.

Have you tried mstackrealign on your code? I got

[hjl@gnu-6 gcc]$ cat x.c
#include <x86intrin.h>

extern void foo1 (__m128, __m128, __m128);
extern void foo2 (__m128, __m128, __m128, __m128);

extern __m128 x;

void
bar1 (void)
{
  foo1 (x, x, x);
}

void
bar2 (void)
{
  foo2 (x, x, x, x);
}
[hjl@gnu-6 gcc]$ gcc -S -O2 -m32 x.c -mstackrealign  -msse2
[hjl@gnu-6 gcc]$ cat x.s
	.file	"x.c"
	.text
	.p2align 4,,15
	.globl	bar1
	.type	bar1, @function
bar1:
.LFB4910:
	.cfi_startproc
	movaps	x, %xmm0
	movaps	%xmm0, %xmm2
	movaps	%xmm0, %xmm1
	jmp	foo1
	.cfi_endproc
.LFE4910:
	.size	bar1, .-bar1
	.p2align 4,,15
	.globl	bar2
	.type	bar2, @function
bar2:
.LFB4911:
	.cfi_startproc
	leal	4(%esp), %ecx
	.cfi_def_cfa 1, 0
	andl	$-16, %esp
	pushl	-4(%ecx)
	pushl	%ebp
	.cfi_escape 0x10,0x5,0x2,0x75,0
	movl	%esp, %ebp
	pushl	%ecx
	.cfi_escape 0xf,0x3,0x75,0x7c,0x6
	subl	$20, %esp
	movaps	x, %xmm0
	movaps	%xmm0, %xmm2
	movaps	%xmm0, %xmm1
	movaps	%xmm0, (%esp)
	call	foo2
	addl	$16, %esp
	movl	-4(%ebp), %ecx
	.cfi_def_cfa 1, 0
	leave
	.cfi_restore 5
	leal	-4(%ecx), %esp
	.cfi_def_cfa 4, 4
	ret
	.cfi_endproc
.LFE4911:
	.size	bar2, .-bar2
	.ident	"GCC: (GNU) 7.1.1 20170709 (Red Hat 7.1.1-4)"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-6 gcc]$ 

GCC aligns stack only in foo2, not in foo1 since there is no need for it.