GCC Bugzilla has been upgraded from version 4.4.9 to 5.0rc3. If you see any problem, please report it to bug 64968.
Bug 49001 - GCC uses VMOVAPS/PD AVX instructions to access stack variables that are not 32-byte aligned
Summary: GCC uses VMOVAPS/PD AVX instructions to access stack variables that are not 3...
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.6.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
: 61730 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-05-14 20:28 UTC by Norbert Pozar
Modified: 2014-09-03 21:18 UTC (History)
2 users (show)

See Also:
Host:
Target: i?86-*-mingw x86_64-*-mingw
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Norbert Pozar 2011-05-14 20:28:48 UTC
I'm using a custom mingw64 build of GCC 4.6.1. My target is Windows 64bit. I compile with g++ -03 -march=corei7-avx -mtune=corei7-avx -mavx.

GCC uses aligned moves VMOVAPS/PD from the new AVX instruction set to access local variables of type __m256/__m256d on the stack. But the stack pointer is only 16byte aligned on Win64, so this causes a segmentation fault error when the stack pointer is not 32byte aligned, as in:

__m256 dummy_ps256;
void test_stackalign32() {
	__m256 x = dummy_ps256;
	dummy_ps256 = sin256_ps_avx(x);
}

which compiles to 

	vmovaps	dummy_ps256(%rip), %ymm0
	leaq	32(%rsp), %rdx
	vmovaps	%ymm0, 32(%rsp)  // possible SEGFAULT
	leaq	64(%rsp), %rcx
	vzeroupper
	call	_Z13sin256_ps_avxDv8_f
	vmovaps	64(%rsp), %ymm0  // possible SEGFAULT

I couldn't figure out how to realign a stack with -mstackrealign.
Comment 1 Uroš Bizjak 2011-05-15 18:49:44 UTC
(In reply to comment #0)
> I'm using a custom mingw64 build of GCC 4.6.1. My target is Windows 64bit. I
> compile with g++ -03 -march=corei7-avx -mtune=corei7-avx -mavx.

Please provide testcase that can be compiled without changes. See [1].

FWIW, I have tested following testcase on x86_64-pc-linux-gnu:

--cut here--
#include <x86intrin.h>

__m256 sin256_ps_avx (__m256);

__m256 dummy_ps256;
void test_stackalign32() {
    volatile __m256 x = dummy_ps256;
    dummy_ps256 = sin256_ps_avx(x);
}
--cut here--

And got expected code (gcc-4.6.1):

test_stackalign32:
.LFB828:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	andq	$-32, %rsp
	subq	$32, %rsp
	vmovaps	dummy_ps256(%rip), %ymm0
	vmovaps	%ymm0, (%rsp)
	vmovaps	(%rsp), %ymm0
	call	sin256_ps_avx
	vmovaps	%ymm0, dummy_ps256(%rip)
	leave
	.cfi_def_cfa 7, 8
	vzeroupper
	ret

Probably mingw64 specific problem... CC added.

[1] http://gcc.gnu.org/bugs/#report
Comment 2 H.J. Lu 2011-05-15 22:10:00 UTC
Stack alignment isn't supported on Windows.
Comment 3 Norbert Pozar 2011-05-16 06:05:37 UTC
(In reply to comment #1)
> Please provide testcase that can be compiled without changes. See [1].

I'm sorry about this.

> Probably mingw64 specific problem... CC added.

Thank you for your time to test the code on linux. I was worried that this might be mingw64 specific.

(In reply to comment #2)
> Stack alignment isn't supported on Windows.

Since this bug effectively prevents using 256bit AVX instructions when compiling for Windows using GCC, I was wondering if there are any plans to support the stack alignment. It seems that simply adding 

andq    $-32, %rsp

to the function prologue would fix this. Or would it be feasible to replace VMOVAPS by unaligned VMOVUPS when accessing the stack?
Comment 4 Roland Schulz 2014-09-03 21:18:51 UTC
*** Bug 61730 has been marked as a duplicate of this bug. ***