Bug 49001

Summary: GCC uses VMOVAPS/PD AVX instructions to access stack variables that are not 32-byte aligned
Product: gcc Reporter: Norbert Pozar <npozar>
Component: targetAssignee: Not yet assigned to anyone <unassigned>
Status: UNCONFIRMED ---    
Severity: normal CC: arthur200126, CoelacanthusHex, ktietz, roland, sjames, thiago, xjkp2283572185
Priority: P3 Keywords: wrong-code
Version: 4.6.1   
Target Milestone: ---   
See Also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412
https://github.com/google/highway/issues/332
https://osdn.net/projects/mingw/ticket/39565
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113989
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978
Host: Target: i?86-*-mingw x86_64-*-mingw
Build: Known to work:
Known to fail: Last reconfirmed:

Description Norbert Pozar 2011-05-14 20:28:48 UTC
I'm using a custom mingw64 build of GCC 4.6.1. My target is Windows 64bit. I compile with g++ -03 -march=corei7-avx -mtune=corei7-avx -mavx.

GCC uses aligned moves VMOVAPS/PD from the new AVX instruction set to access local variables of type __m256/__m256d on the stack. But the stack pointer is only 16byte aligned on Win64, so this causes a segmentation fault error when the stack pointer is not 32byte aligned, as in:

__m256 dummy_ps256;
void test_stackalign32() {
	__m256 x = dummy_ps256;
	dummy_ps256 = sin256_ps_avx(x);
}

which compiles to 

	vmovaps	dummy_ps256(%rip), %ymm0
	leaq	32(%rsp), %rdx
	vmovaps	%ymm0, 32(%rsp)  // possible SEGFAULT
	leaq	64(%rsp), %rcx
	vzeroupper
	call	_Z13sin256_ps_avxDv8_f
	vmovaps	64(%rsp), %ymm0  // possible SEGFAULT

I couldn't figure out how to realign a stack with -mstackrealign.
Comment 1 Uroš Bizjak 2011-05-15 18:49:44 UTC
(In reply to comment #0)
> I'm using a custom mingw64 build of GCC 4.6.1. My target is Windows 64bit. I
> compile with g++ -03 -march=corei7-avx -mtune=corei7-avx -mavx.

Please provide testcase that can be compiled without changes. See [1].

FWIW, I have tested following testcase on x86_64-pc-linux-gnu:

--cut here--
#include <x86intrin.h>

__m256 sin256_ps_avx (__m256);

__m256 dummy_ps256;
void test_stackalign32() {
    volatile __m256 x = dummy_ps256;
    dummy_ps256 = sin256_ps_avx(x);
}
--cut here--

And got expected code (gcc-4.6.1):

test_stackalign32:
.LFB828:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	andq	$-32, %rsp
	subq	$32, %rsp
	vmovaps	dummy_ps256(%rip), %ymm0
	vmovaps	%ymm0, (%rsp)
	vmovaps	(%rsp), %ymm0
	call	sin256_ps_avx
	vmovaps	%ymm0, dummy_ps256(%rip)
	leave
	.cfi_def_cfa 7, 8
	vzeroupper
	ret

Probably mingw64 specific problem... CC added.

[1] http://gcc.gnu.org/bugs/#report
Comment 2 H.J. Lu 2011-05-15 22:10:00 UTC
Stack alignment isn't supported on Windows.
Comment 3 Norbert Pozar 2011-05-16 06:05:37 UTC
(In reply to comment #1)
> Please provide testcase that can be compiled without changes. See [1].

I'm sorry about this.

> Probably mingw64 specific problem... CC added.

Thank you for your time to test the code on linux. I was worried that this might be mingw64 specific.

(In reply to comment #2)
> Stack alignment isn't supported on Windows.

Since this bug effectively prevents using 256bit AVX instructions when compiling for Windows using GCC, I was wondering if there are any plans to support the stack alignment. It seems that simply adding 

andq    $-32, %rsp

to the function prologue would fix this. Or would it be feasible to replace VMOVAPS by unaligned VMOVUPS when accessing the stack?
Comment 4 Roland Schulz 2014-09-03 21:18:51 UTC
*** Bug 61730 has been marked as a duplicate of this bug. ***
Comment 5 Mingye Wang 2021-08-22 18:35:32 UTC
I think I am bumping into the same bug with GCC 10.3.0, MinGW64 environment, in an SIMD library at [1].
  [1]: https://github.com/google/highway/issues/332

There was a related bug at [2] showing another small (not quite minimal) test case.
  [2]: https://osdn.net/projects/mingw/ticket/39565

The VMOVUPS idea seems cool -- can we do it?
Comment 6 Mingye Wang 2021-08-22 18:39:08 UTC
FWIW, the ticket about doing stuff to align the stack in the prologue is bug 54412. Apologies for the noisy emails, but thing is I can't do the see-also thing here.
Comment 7 Thiago Macieira 2021-12-21 12:35:53 UTC
Hack to workaround:

asm(
    ".macro vmovapd args:vararg\n"
    "    vmovupd \\args\n"
    ".endm\n"
    ".macro vmovaps args:vararg\n"
    "    vmovups \\args\n"
    ".endm\n"
    ".macro vmovdqa args:vararg\n"
    "    vmovdqu \\args\n"
    ".endm\n"
    ".macro vmovdqa32 args:vararg\n"
    "    vmovdqu32 \\args\n"
    ".endm\n"
    ".macro vmovdqa64 args:vararg\n"
    "    vmovdqu64 \\args\n"
    ".endm\n"
);

See: https://github.com/opendcdiag/opendcdiag/blob/main/framework/sysdeps/windows/win32_stdlib.h#L11-L34
Comment 8 Andrew Pinski 2024-02-19 17:00:27 UTC
*** Bug 113989 has been marked as a duplicate of this bug. ***