Created attachment 35328 [details] Assembly output https://raw.githubusercontent.com/foo86/dcadec/4dac90072f1a0ad368430dbbb568ac71def0241f/libdcadec/idct_float.c GCC 5.1.0 RC and mingw-w64 v4.0.1, cross-compiler. Can also be reproduced with GCC 4.9 [jamrial@archVM dcadec]$ x86_64-w64-mingw32-gcc -O3 -mavx512f -c -o libdcadec/idct_float.o libdcadec/idct_float.c /tmp/ccGUgVPR.s: Assembler messages: /tmp/ccGUgVPR.s:557: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:559: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:561: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:563: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:565: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:567: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:569: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:571: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:573: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:575: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:577: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:579: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:581: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:583: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:585: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:587: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:1482: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:1484: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:1486: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:1488: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:1490: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:1492: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:1494: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:1496: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:1498: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:1500: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:1502: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:1504: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:1506: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:1508: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:1510: Error: invalid register for .seh_savexmm /tmp/ccGUgVPR.s:1512: Error: invalid register for .seh_savexmm [jamrial@archVM dcadec]$ x86_64-w64-mingw32-gcc -v Using built-in specs. COLLECT_GCC=x86_64-w64-mingw32-gcc COLLECT_LTO_WRAPPER=/opt/mingw64/lib/gcc/x86_64-w64-mingw32/5.0.1/lto-wrapper Target: x86_64-w64-mingw32 Configured with: /home/jamrial/gcc-5.1.0-RC-20150412/configure --host=x86_64-unknown-linux-gnu --build=x86_64-unknown-linux-gnu --target=x86_64-w64-mingw32 --disable-multilib --enable-static --disable-shared --enable-64bit --prefix=/opt/mingw64 --with-sysroot=/opt/mingw64 --enable-version-specific-runtime-libs --with-dwarf --enable-fully-dynamic-string --enable-languages=c,c++ --enable-libssp --with-host-libstdcxx='-lstdc++ -lsupc++' --enable-lto --disable-win32-registry --libexecdir=/opt/mingw64/lib --disable-nls Thread model: win32 gcc version 5.0.1 20150412 (prerelease) (GCC) Attached is the resulting assembly file.
This issue is related to output in gcc for SEH-prologue pseudos. It tries to output registers not being supported 8-byte SSE ones. Generally, AVX512 can't be supported in an 32-byte aligned way on x64 target anyway.
For the xmm16 to xmm31 registers a possible workaround could be to turn those registers fixed on mingw (thus unavailable for register allocation and not call saved). See PR79127.
Cygwin (x86_64-pc-cygwin) is also affected. I have encountered this bug on gcc 7.4.0. Could you add new option which would remove XMM16+ registers from available registers pool? It could be used as an easy to use workaround until you fix it properly.
I have found that I can use -ffixed-reg option for this. It allows to eliminate one register, so I have to use it 16 times to eliminate all xmm16..31 registers. It would be handy to have another option which would allow to disable all registers from this group together.
I got following link: https://stackoverflow.com/questions/53733624/is-xmm8-register-value-preserved-across-calls/53733767#53733767 Quote from it: "Any additional registers for newer instruction sets are volatile by default. This includes the upper parts of YMM0-15 and ZMM0-15 as well as ?MM16-31 if present.". So it looks that gcc should not generate .seh_savexmm for xmm16..31 at all.
I get the same error with G++ 7.4.0 Cygwin when compiling with option -mavx512vl -m64. A workaround is to use -fno-asynchronous-unwind-tables Register xmm16-31 should be considered clobbered in Win64. See https://stackoverflow.com/questions/43152633/invalid-register-for-seh-savexmm-in-cygwin
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:79ab8c4321b2dc940bb706a7432a530e26f0df1a commit r10-6522-g79ab8c4321b2dc940bb706a7432a530e26f0df1a Author: Jakub Jelinek <jakub@redhat.com> Date: Sat Feb 8 10:59:40 2020 +0100 i386: Make xmm16-xmm31 call used even in ms ABI [PR65782] On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote: > I guess that Comment #9 patch form the PR should be trivially correct, > but althouhg it looks obvious, I don't want to propose the patch since > I have no means of testing it. I don't have means of testing it either. https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 is quite explicit that [xyz]mm16-31 are call clobbered and only xmm6-15 (low 128-bits only) are call preserved. We are talking e.g. about /* { dg-options "-O2 -mabi=ms -mavx512vl" } */ typedef double V __attribute__((vector_size (16))); void foo (void); V bar (void); void baz (V); void qux (void) { V c; { register V a __asm ("xmm18"); V b = bar (); asm ("" : "=x" (a) : "0" (b)); c = a; } foo (); { register V d __asm ("xmm18"); V e; d = c; asm ("" : "=x" (e) : "0" (d)); baz (e); } } where according to the MSDN doc gcc incorrectly holds the c value in xmm18 register across the foo call; if foo is compiled by some Microsoft compiler (or LLVM), then it could clobber %xmm18. If all xmm18 occurrences are changed to say xmm15, then it is valid to hold the 128-bit value across the foo call (though, surprisingly, LLVM saves it into stack anyway). The other parts are I guess mainly about SEH. Consider e.g. void foo (void) { register double x __asm ("xmm14"); register double y __asm ("xmm18"); asm ("" : "=x" (x)); asm ("" : "=v" (y)); x += y; y += x; asm ("" : : "x" (x)); asm ("" : : "v" (y)); } looking at cross-compiler output, with -O2 -mavx512f this emits .file "abcdeq.c" .text .align 16 .globl foo .def foo; .scl 2; .type 32; .endef .seh_proc foo foo: subq $40, %rsp .seh_stackalloc 40 vmovaps %xmm14, (%rsp) .seh_savexmm %xmm14, 0 vmovaps %xmm18, 16(%rsp) .seh_savexmm %xmm18, 16 .seh_endprologue vaddsd %xmm18, %xmm14, %xmm14 vaddsd %xmm18, %xmm14, %xmm18 vmovaps (%rsp), %xmm14 vmovaps 16(%rsp), %xmm18 addq $40, %rsp ret .seh_endproc .ident "GCC: (GNU) 10.0.1 20200207 (experimental)" Does whatever assembler mingw64 uses even assemble this (I mean the .seh_savexmm %xmm16, 16 could be problematic)? I can find e.g. https://stackoverflow.com/questions/43152633/invalid-register-for-seh-savexmm-in-cygwin/43210527 which then links to https://gcc.gnu.org/PR65782 2020-02-08 Uroš Bizjak <ubizjak@gmail.com> Jakub Jelinek <jakub@redhat.com> PR target/65782 * config/i386/i386.h (CALL_USED_REGISTERS): Make xmm16-xmm31 call-used even in 64-bit ms-abi. * gcc.target/i386/pr65782.c: New test. Co-authored-by: Uroš Bizjak <ubizjak@gmail.com>
Hmm, that behavior of gcc seems to be indeed pretty bad. The SEH commands for registers above index 15 (0..15) for xmm? are indeed undefined, and even worse, can't be coded proper into the seh table correctly. Anything above 16-byte size of ?mm registers, and anything above register index 15 has to be treated as call clobbered. But in anycase, the unwind information has not to contain that information
The releases/gcc-9 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:a91e5d88970c8d865a49f2a4ed4e17ee2c58b73f commit r9-8222-ga91e5d88970c8d865a49f2a4ed4e17ee2c58b73f Author: Jakub Jelinek <jakub@redhat.com> Date: Sat Feb 8 10:59:40 2020 +0100 i386: Make xmm16-xmm31 call used even in ms ABI [PR65782] On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote: > I guess that Comment #9 patch form the PR should be trivially correct, > but althouhg it looks obvious, I don't want to propose the patch since > I have no means of testing it. I don't have means of testing it either. https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 is quite explicit that [xyz]mm16-31 are call clobbered and only xmm6-15 (low 128-bits only) are call preserved. We are talking e.g. about /* { dg-options "-O2 -mabi=ms -mavx512vl" } */ typedef double V __attribute__((vector_size (16))); void foo (void); V bar (void); void baz (V); void qux (void) { V c; { register V a __asm ("xmm18"); V b = bar (); asm ("" : "=x" (a) : "0" (b)); c = a; } foo (); { register V d __asm ("xmm18"); V e; d = c; asm ("" : "=x" (e) : "0" (d)); baz (e); } } where according to the MSDN doc gcc incorrectly holds the c value in xmm18 register across the foo call; if foo is compiled by some Microsoft compiler (or LLVM), then it could clobber %xmm18. If all xmm18 occurrences are changed to say xmm15, then it is valid to hold the 128-bit value across the foo call (though, surprisingly, LLVM saves it into stack anyway). The other parts are I guess mainly about SEH. Consider e.g. void foo (void) { register double x __asm ("xmm14"); register double y __asm ("xmm18"); asm ("" : "=x" (x)); asm ("" : "=v" (y)); x += y; y += x; asm ("" : : "x" (x)); asm ("" : : "v" (y)); } looking at cross-compiler output, with -O2 -mavx512f this emits .file "abcdeq.c" .text .align 16 .globl foo .def foo; .scl 2; .type 32; .endef .seh_proc foo foo: subq $40, %rsp .seh_stackalloc 40 vmovaps %xmm14, (%rsp) .seh_savexmm %xmm14, 0 vmovaps %xmm18, 16(%rsp) .seh_savexmm %xmm18, 16 .seh_endprologue vaddsd %xmm18, %xmm14, %xmm14 vaddsd %xmm18, %xmm14, %xmm18 vmovaps (%rsp), %xmm14 vmovaps 16(%rsp), %xmm18 addq $40, %rsp ret .seh_endproc .ident "GCC: (GNU) 10.0.1 20200207 (experimental)" Does whatever assembler mingw64 uses even assemble this (I mean the .seh_savexmm %xmm16, 16 could be problematic)? I can find e.g. https://stackoverflow.com/questions/43152633/invalid-register-for-seh-savexmm-in-cygwin/43210527 which then links to https://gcc.gnu.org/PR65782 2020-02-08 Uroš Bizjak <ubizjak@gmail.com> Jakub Jelinek <jakub@redhat.com> PR target/65782 * config/i386/i386.h (CALL_USED_REGISTERS): Make xmm16-xmm31 call-used even in 64-bit ms-abi. * gcc.target/i386/pr65782.c: New test. Co-authored-by: Uroš Bizjak <ubizjak@gmail.com>
The releases/gcc-8 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:baef3efdc4992e4dcb7f4de62ff5bcb13bf05f60 commit r8-10016-gbaef3efdc4992e4dcb7f4de62ff5bcb13bf05f60 Author: Jakub Jelinek <jakub@redhat.com> Date: Fri Feb 14 15:47:55 2020 +0100 i386: Make xmm16-xmm31 call used even in ms ABI [PR65782] On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote: > I guess that Comment #9 patch form the PR should be trivially correct, > but althouhg it looks obvious, I don't want to propose the patch since > I have no means of testing it. I don't have means of testing it either. https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 is quite explicit that [xyz]mm16-31 are call clobbered and only xmm6-15 (low 128-bits only) are call preserved. We are talking e.g. about /* { dg-options "-O2 -mabi=ms -mavx512vl" } */ typedef double V __attribute__((vector_size (16))); void foo (void); V bar (void); void baz (V); void qux (void) { V c; { register V a __asm ("xmm18"); V b = bar (); asm ("" : "=x" (a) : "0" (b)); c = a; } foo (); { register V d __asm ("xmm18"); V e; d = c; asm ("" : "=x" (e) : "0" (d)); baz (e); } } where according to the MSDN doc gcc incorrectly holds the c value in xmm18 register across the foo call; if foo is compiled by some Microsoft compiler (or LLVM), then it could clobber %xmm18. If all xmm18 occurrences are changed to say xmm15, then it is valid to hold the 128-bit value across the foo call (though, surprisingly, LLVM saves it into stack anyway). The other parts are I guess mainly about SEH. Consider e.g. void foo (void) { register double x __asm ("xmm14"); register double y __asm ("xmm18"); asm ("" : "=x" (x)); asm ("" : "=v" (y)); x += y; y += x; asm ("" : : "x" (x)); asm ("" : : "v" (y)); } looking at cross-compiler output, with -O2 -mavx512f this emits .file "abcdeq.c" .text .align 16 .globl foo .def foo; .scl 2; .type 32; .endef .seh_proc foo foo: subq $40, %rsp .seh_stackalloc 40 vmovaps %xmm14, (%rsp) .seh_savexmm %xmm14, 0 vmovaps %xmm18, 16(%rsp) .seh_savexmm %xmm18, 16 .seh_endprologue vaddsd %xmm18, %xmm14, %xmm14 vaddsd %xmm18, %xmm14, %xmm18 vmovaps (%rsp), %xmm14 vmovaps 16(%rsp), %xmm18 addq $40, %rsp ret .seh_endproc .ident "GCC: (GNU) 10.0.1 20200207 (experimental)" Does whatever assembler mingw64 uses even assemble this (I mean the .seh_savexmm %xmm16, 16 could be problematic)? I can find e.g. https://stackoverflow.com/questions/43152633/invalid-register-for-seh-savexmm-in-cygwin/43210527 which then links to https://gcc.gnu.org/PR65782 2020-02-08 Uroš Bizjak <ubizjak@gmail.com> Jakub Jelinek <jakub@redhat.com> PR target/65782 * config/i386/i386.h (CALL_USED_REGISTERS): Make xmm16-xmm31 call-used even in 64-bit ms-abi. * gcc.target/i386/pr65782.c: New test. Co-authored-by: Uroš Bizjak <ubizjak@gmail.com>
Fixed on the trunk and all release branches.