I'm using a custom mingw64 build of GCC 4.6.1. My target is Windows 64bit. I compile with g++ -03 -march=corei7-avx -mtune=corei7-avx -mavx. GCC uses aligned moves VMOVAPS/PD from the new AVX instruction set to access local variables of type __m256/__m256d on the stack. But the stack pointer is only 16byte aligned on Win64, so this causes a segmentation fault error when the stack pointer is not 32byte aligned, as in: __m256 dummy_ps256; void test_stackalign32() { __m256 x = dummy_ps256; dummy_ps256 = sin256_ps_avx(x); } which compiles to vmovaps dummy_ps256(%rip), %ymm0 leaq 32(%rsp), %rdx vmovaps %ymm0, 32(%rsp) // possible SEGFAULT leaq 64(%rsp), %rcx vzeroupper call _Z13sin256_ps_avxDv8_f vmovaps 64(%rsp), %ymm0 // possible SEGFAULT I couldn't figure out how to realign a stack with -mstackrealign.
(In reply to comment #0) > I'm using a custom mingw64 build of GCC 4.6.1. My target is Windows 64bit. I > compile with g++ -03 -march=corei7-avx -mtune=corei7-avx -mavx. Please provide testcase that can be compiled without changes. See [1]. FWIW, I have tested following testcase on x86_64-pc-linux-gnu: --cut here-- #include <x86intrin.h> __m256 sin256_ps_avx (__m256); __m256 dummy_ps256; void test_stackalign32() { volatile __m256 x = dummy_ps256; dummy_ps256 = sin256_ps_avx(x); } --cut here-- And got expected code (gcc-4.6.1): test_stackalign32: .LFB828: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 andq $-32, %rsp subq $32, %rsp vmovaps dummy_ps256(%rip), %ymm0 vmovaps %ymm0, (%rsp) vmovaps (%rsp), %ymm0 call sin256_ps_avx vmovaps %ymm0, dummy_ps256(%rip) leave .cfi_def_cfa 7, 8 vzeroupper ret Probably mingw64 specific problem... CC added. [1] http://gcc.gnu.org/bugs/#report
Stack alignment isn't supported on Windows.
(In reply to comment #1) > Please provide testcase that can be compiled without changes. See [1]. I'm sorry about this. > Probably mingw64 specific problem... CC added. Thank you for your time to test the code on linux. I was worried that this might be mingw64 specific. (In reply to comment #2) > Stack alignment isn't supported on Windows. Since this bug effectively prevents using 256bit AVX instructions when compiling for Windows using GCC, I was wondering if there are any plans to support the stack alignment. It seems that simply adding andq $-32, %rsp to the function prologue would fix this. Or would it be feasible to replace VMOVAPS by unaligned VMOVUPS when accessing the stack?
*** Bug 61730 has been marked as a duplicate of this bug. ***
I think I am bumping into the same bug with GCC 10.3.0, MinGW64 environment, in an SIMD library at [1]. [1]: https://github.com/google/highway/issues/332 There was a related bug at [2] showing another small (not quite minimal) test case. [2]: https://osdn.net/projects/mingw/ticket/39565 The VMOVUPS idea seems cool -- can we do it?
FWIW, the ticket about doing stuff to align the stack in the prologue is bug 54412. Apologies for the noisy emails, but thing is I can't do the see-also thing here.
Hack to workaround: asm( ".macro vmovapd args:vararg\n" " vmovupd \\args\n" ".endm\n" ".macro vmovaps args:vararg\n" " vmovups \\args\n" ".endm\n" ".macro vmovdqa args:vararg\n" " vmovdqu \\args\n" ".endm\n" ".macro vmovdqa32 args:vararg\n" " vmovdqu32 \\args\n" ".endm\n" ".macro vmovdqa64 args:vararg\n" " vmovdqu64 \\args\n" ".endm\n" ); See: https://github.com/opendcdiag/opendcdiag/blob/main/framework/sysdeps/windows/win32_stdlib.h#L11-L34