Summary: | GCC uses VMOVAPS/PD AVX instructions to access stack variables that are not 32-byte aligned | ||
---|---|---|---|
Product: | gcc | Reporter: | Norbert Pozar <npozar> |
Component: | target | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | UNCONFIRMED --- | ||
Severity: | normal | CC: | arthur200126, CoelacanthusHex, ktietz, roland, sjames, thiago, xjkp2283572185 |
Priority: | P3 | Keywords: | wrong-code |
Version: | 4.6.1 | ||
Target Milestone: | --- | ||
See Also: |
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412 https://github.com/google/highway/issues/332 https://osdn.net/projects/mingw/ticket/39565 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113989 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978 |
||
Host: | Target: | i?86-*-mingw x86_64-*-mingw | |
Build: | Known to work: | ||
Known to fail: | Last reconfirmed: |
Description
Norbert Pozar
2011-05-14 20:28:48 UTC
(In reply to comment #0) > I'm using a custom mingw64 build of GCC 4.6.1. My target is Windows 64bit. I > compile with g++ -03 -march=corei7-avx -mtune=corei7-avx -mavx. Please provide testcase that can be compiled without changes. See [1]. FWIW, I have tested following testcase on x86_64-pc-linux-gnu: --cut here-- #include <x86intrin.h> __m256 sin256_ps_avx (__m256); __m256 dummy_ps256; void test_stackalign32() { volatile __m256 x = dummy_ps256; dummy_ps256 = sin256_ps_avx(x); } --cut here-- And got expected code (gcc-4.6.1): test_stackalign32: .LFB828: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 andq $-32, %rsp subq $32, %rsp vmovaps dummy_ps256(%rip), %ymm0 vmovaps %ymm0, (%rsp) vmovaps (%rsp), %ymm0 call sin256_ps_avx vmovaps %ymm0, dummy_ps256(%rip) leave .cfi_def_cfa 7, 8 vzeroupper ret Probably mingw64 specific problem... CC added. [1] http://gcc.gnu.org/bugs/#report Stack alignment isn't supported on Windows. (In reply to comment #1) > Please provide testcase that can be compiled without changes. See [1]. I'm sorry about this. > Probably mingw64 specific problem... CC added. Thank you for your time to test the code on linux. I was worried that this might be mingw64 specific. (In reply to comment #2) > Stack alignment isn't supported on Windows. Since this bug effectively prevents using 256bit AVX instructions when compiling for Windows using GCC, I was wondering if there are any plans to support the stack alignment. It seems that simply adding andq $-32, %rsp to the function prologue would fix this. Or would it be feasible to replace VMOVAPS by unaligned VMOVUPS when accessing the stack? *** Bug 61730 has been marked as a duplicate of this bug. *** I think I am bumping into the same bug with GCC 10.3.0, MinGW64 environment, in an SIMD library at [1]. [1]: https://github.com/google/highway/issues/332 There was a related bug at [2] showing another small (not quite minimal) test case. [2]: https://osdn.net/projects/mingw/ticket/39565 The VMOVUPS idea seems cool -- can we do it? FWIW, the ticket about doing stuff to align the stack in the prologue is bug 54412. Apologies for the noisy emails, but thing is I can't do the see-also thing here. Hack to workaround: asm( ".macro vmovapd args:vararg\n" " vmovupd \\args\n" ".endm\n" ".macro vmovaps args:vararg\n" " vmovups \\args\n" ".endm\n" ".macro vmovdqa args:vararg\n" " vmovdqu \\args\n" ".endm\n" ".macro vmovdqa32 args:vararg\n" " vmovdqu32 \\args\n" ".endm\n" ".macro vmovdqa64 args:vararg\n" " vmovdqu64 \\args\n" ".endm\n" ); See: https://github.com/opendcdiag/opendcdiag/blob/main/framework/sysdeps/windows/win32_stdlib.h#L11-L34 *** Bug 113989 has been marked as a duplicate of this bug. *** |