movoi_internal_avx and movti_internal have (set (attr "mode") (cond [(ior (match_operand 0 "ext_sse_reg_operand") (match_operand 1 "ext_sse_reg_operand")) (const_string "XI") (and (eq_attr "alternative" "1") (match_test "TARGET_AVX512VL")) (const_string "XI") (ior (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") (and (eq_attr "alternative" "3") (match_test "TARGET_SSE_TYPELESS_STORES"))) (const_string "V8SF") ] (const_string "OI")))]) But (and (eq_attr "alternative" "1") (match_test "TARGET_AVX512VL")) (const_string "XI") is unnecessary. As the result, we are generating vpternlogd $0xFF, %zmm0, %zmm0, %zmm0 which is only needed for %xmm16 - %xmm31/%ymm16 - %ymm31, when vpcmpeqd %ymm0, %ymm0, %ymm0 or vpcmpeqd %xmm0, %xmm0, %xmm0 are sufficient.
sse.md has (define_insn "mov<mode>_internal" [(set (match_operand:VMOVE 0 "nonimmediate_operand" "=v,v ,v ,m") (match_operand:VMOVE 1 "nonimmediate_or_sse_const_operand" " C,BC,vm,v"))] "TARGET_SSE && (register_operand (operands[0], <MODE>mode) || register_operand (operands[1], <MODE>mode))" ... (set (attr "mode") (cond [(and (eq_attr "alternative" "1") (match_test "TARGET_AVX512VL")) (const_string "<sseinsnmode>") (and (match_test "<MODE_SIZE> == 16") (ior (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") (and (eq_attr "alternative" "3") (match_test "TARGET_SSE_TYPELESS_STORES")))) (const_string "<ssePSmode>") (match_test "TARGET_AVX") (const_string "<sseinsnmode>") (ior (not (match_test "TARGET_SSE2")) (match_test "optimize_function_for_size_p (cfun)")) (const_string "V4SF") (and (eq_attr "alternative" "0") (match_test "TARGET_SSE_LOAD0_BY_PXOR")) (const_string "TI") ] (const_string "<sseinsnmode>"))) (and (eq_attr "alternative" "1") (match_test "TARGET_AVX512VL")) (const_string "<sseinsnmode>") is OK.
Another problem: (cond [(ior (match_operand 0 "ext_sse_reg_operand") (match_operand 1 "ext_sse_reg_operand")) (const_string "XI") We shouldn't use XI for TARGET_AVX512VL. OI/TI is OK for upper 16 vector registers with TARGET_AVX512VL.
Author: hjl Date: Thu Feb 7 17:58:19 2019 New Revision: 268657 URL: https://gcc.gnu.org/viewcvs?rev=268657&root=gcc&view=rev Log: i386: Fix typo in *movoi_internal_avx/movti_internal PR target/89229 * config/i386/i386.md (*movoi_internal_avx): Set mode to OI for TARGET_AVX512VL. (*movti_internal): Set mode to TI for TARGET_AVX512VL. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.md
Is this fixed on trunk now?
Author: hjl Date: Fri Feb 8 11:30:53 2019 New Revision: 268678 URL: https://gcc.gnu.org/viewcvs?rev=268678&root=gcc&view=rev Log: i386: Use OI/TImode in *mov[ot]i_internal_avx with AVX512VL OImode and TImode moves must be done in XImode to access upper 16 vector registers without AVX512VL. With AVX512VL, we can access upper 16 vector registers in OImode and TImode. PR target/89229 * config/i386/i386.md (*movoi_internal_avx): Set mode to XI for upper 16 vector registers without TARGET_AVX512VL. (*movti_internal): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.md
(In reply to Richard Biener from comment #4) > Is this fixed on trunk now? Yes.
[hjl@gnu-cfl-1 gcc]$ cat /export/gnu/import/git/gitlab/x86-gcc/gcc/testsuite/gcc.target/i386/pr89029-1.c /* { dg-do assemble { target { avx512bw && avx512vl } } } */ /* { dg-options "-O1 -mavx512bw -mavx512vl -mtune=skylake-avx512" } */ extern void abort (void); extern void exit (int); struct s { unsigned char a[256]; }; union u { struct { struct s b; int c; } d; struct { int c; struct s b; } e; }; static union u v; static union u v0; static struct s *p = &v.d.b; static struct s *q = &v.e.b; static inline struct s rp (void) { return *p; } static inline struct s rq (void) { return *q; } static void pq (void) { *p = rq(); } static void qp (void) { *q = rp(); } static void init (struct s *sp) { int i; for (i = 0; i < 256; i++) sp->a[i] = i; } static void check (struct s *sp) { int i; for (i = 0; i < 256; i++) if (sp->a[i] != i) abort (); } void main_test (void) { v = v0; init (p); qp (); check (q); v = v0; init (q); pq (); check (p); exit (0); } [hjl@gnu-cfl-1 gcc]$ ./xgcc -B./ -c -O1 -mavx512bw -mavx512vl /export/gnu/import/git/gitlab/x86-gcc/gcc/testsuite/gcc.target/i386/pr89029-1.c -march=skylake-avx512 /tmp/ccqZUBNW.s: Assembler messages: /tmp/ccqZUBNW.s:34: Error: unsupported instruction `vmovdqa' /tmp/ccqZUBNW.s:35: Error: unsupported instruction `vmovdqa' /tmp/ccqZUBNW.s:36: Error: unsupported instruction `vmovdqa' [hjl@gnu-cfl-1 gcc]$
Author: hjl Date: Tue Feb 12 19:00:35 2019 New Revision: 268811 URL: https://gcc.gnu.org/viewcvs?rev=268811&root=gcc&view=rev Log: i386: Revert revision 268678 and revision 268657 i386 backend has INT_MODE (OI, 32); INT_MODE (XI, 64); So, XI_MODE represents 64 INTEGER bytes = 64 * 8 = 512 bit operation, in case of const_1, all 512 bits set. We can load zeros with narrower instruction, (e.g. 256 bit by inherent zeroing of highpart in case of 128 bit xor), so TImode in this case. Some targets prefer V4SF mode, so they will emit float xorps for zeroing Then the introduction of AVX512F fubared everything by overloading the meaning of insn mode. How should we use INSN mode, MODE_XI, in standard_sse_constant_opcode and patterns which use standard_sse_constant_opcode? 2 options: 1. MODE_XI should only used to check if EXT_REX_SSE_REG_P is true in any register operand. The operand size must be determined by operand itself , not by MODE_XI. The operand encoding size should be determined by the operand size, EXT_REX_SSE_REG_P and AVX512VL. 2. MODE_XI should be used to determine the operand encoding size. EXT_REX_SSE_REG_P and AVX512VL should be checked for encoding instructions. gcc/ PR target/89229 * config/i386/i386.md (*movoi_internal_avx): Revert revision 268678 and revision 268657. (*movti_internal): Likewise. gcc/testsuite/ PR target/89229 * gcc.target/i386/pr89229-1.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr89229-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.md trunk/gcc/testsuite/ChangeLog
[hjl@gnu-4 i386]$ cat pr89229-2.c /* { dg-do compile } */ /* { dg-options "-O2 -march=skylake-avx512" } */ typedef __int128 __m128t __attribute__ ((__vector_size__ (16), __may_alias__)); __m128t foo (void) { register __int128 xmm16 __asm ("xmm16") = (__int128) -1; asm volatile ("" : "+v" (xmm16)); return (__m128t) xmm16; } /* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */ [hjl@gnu-4 i386]$ gcc -O2 -march=skylake-avx512 -S pr89229-2.c -o /tmp/x.s [hjl@gnu-4 i386]$ cat /tmp/x.s .file "pr89229-2.c" .text .p2align 4,,15 .globl foo .type foo, @function foo: .LFB0: .cfi_startproc vpternlogd $0xFF, %zmm16, %zmm16, %zmm16 <<<<<<< Should be xmm16 vmovdqa64 %xmm16, %xmm0 ret .cfi_endproc .LFE0: .size foo, .-foo .ident "GCC: (GNU) 8.2.1 20190209 (Red Hat 8.2.1-8)" .section .note.GNU-stack,"",@progbits [hjl@gnu-4 i386]$
Though, is this really a regression? I mean, have we ever emitted better code?
(In reply to Jakub Jelinek from comment #10) > Though, is this really a regression? I mean, have we ever emitted better > code? It isn't a regression.
[hjl@gnu-4 tmp]$ cat x.c /* { dg-do compile } */ /* { dg-options "-O2 -march=skylake-avx512" } */ extern int i; int foo1 (void) { register int xmm16 __asm ("xmm16") = i; asm volatile ("" : "+v" (xmm16)); register int xmm17 __asm ("xmm17") = xmm16; asm volatile ("" : "+v" (xmm17)); return xmm17; } int foo2 (void) { register int xmm1 __asm ("xmm1") = i; asm volatile ("" : "+v" (xmm1)); register int xmm17 __asm ("xmm17") = xmm1; asm volatile ("" : "+v" (xmm17)); return xmm1; } /* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */ [hjl@gnu-4 tmp]$ gcc -S -O2 -march=skylake-avx512 x.c [hjl@gnu-4 tmp]$ cat x.s .file "x.c" .text .p2align 4,,15 .globl foo1 .type foo1, @function foo1: .LFB0: .cfi_startproc vmovd i(%rip), %xmm16 vmovdqa32 %zmm16, %zmm17 vmovd %xmm17, %eax ret .cfi_endproc .LFE0: .size foo1, .-foo1 .p2align 4,,15 .globl foo2 .type foo2, @function foo2: .LFB1: .cfi_startproc vmovd i(%rip), %xmm1 vmovdqa32 %zmm1, %zmm17 vmovd %xmm1, %eax ret .cfi_endproc .LFE1: .size foo2, .-foo2 .ident "GCC: (GNU) 8.2.1 20190209 (Red Hat 8.2.1-8)" .section .note.GNU-stack,"",@progbits [hjl@gnu-4 tmp]$
Created attachment 45685 [details] I am testing this
Comment on attachment 45685 [details] I am testing this The movsi change doesn't look entirely right to me. While OImode or TImode is not allowed in ext sse regs unless AVX512VL, that is not the case for SImode, so for SImode if one or both operands are ext sse regs and !TARGET_AVX512VL, we need to use MODE_XI and use the pattern with %g1, %g0 in there.
[hjl@gnu-4 gcc]$ cat /tmp/x.c /* { dg-do compile { target { ! ia32 } } } */ /* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */ extern double d; void foo1 (double x) { register double xmm16 __asm ("xmm16") = x; asm volatile ("" : "+v" (xmm16)); d = xmm16; } /* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */ [hjl@gnu-4 gcc]$ gcc -S -O2 -march=skylake-avx512 /tmp/x.c -mprefer-vector-width=512 [hjl@gnu-4 gcc]$ cat x.s .file "x.c" .text .p2align 4,,15 .globl foo1 .type foo1, @function foo1: .LFB0: .cfi_startproc vmovapd %zmm0, %zmm16 vmovsd %xmm16, d(%rip) ret .cfi_endproc .LFE0: .size foo1, .-foo1 .ident "GCC: (GNU) 8.2.1 20190209 (Red Hat 8.2.1-8)" .section .note.GNU-stack,"",@progbits [hjl@gnu-4 gcc]$
[hjl@gnu-4 gcc]$ cat /tmp/y.c /* { dg-do compile { target { ! ia32 } } } */ /* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */ extern float d; void foo1 (float x) { register float xmm16 __asm ("xmm16") = x; asm volatile ("" : "+v" (xmm16)); d = xmm16; } /* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */ [hjl@gnu-4 gcc]$ gcc -S -O2 -march=skylake-avx512 /tmp/y.c -mprefer-vector-width=512 [hjl@gnu-4 gcc]$ cat y.s .file "y.c" .text .p2align 4,,15 .globl foo1 .type foo1, @function foo1: .LFB0: .cfi_startproc vmovaps %zmm0, %zmm16 vmovss %xmm16, d(%rip) ret .cfi_endproc .LFE0: .size foo1, .-foo1 .ident "GCC: (GNU) 8.2.1 20190209 (Red Hat 8.2.1-8)" .section .note.GNU-stack,"",@progbits [hjl@gnu-4 gcc]$
[hjl@gnu-4 gcc]$ cat /tmp/z.c /* { dg-do compile { target { ! ia32 } } } */ /* { dg-options "-O2 -march=skylake-avx512" } */ extern long long i; long long foo1 (void) { register long long xmm16 __asm ("xmm16") = i; asm volatile ("" : "+v" (xmm16)); register long long xmm17 __asm ("xmm17") = xmm16; asm volatile ("" : "+v" (xmm17)); return xmm17; } /* { dg-final { scan-assembler-times "vmovdqa32\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */ /* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */ [hjl@gnu-4 gcc]$ gcc -S -O2 -march=skylake-avx512 /tmp/z.c -mno-avx512vl [hjl@gnu-4 gcc]$ cat z.s .file "z.c" .text .p2align 4,,15 .globl foo1 .type foo1, @function foo1: .LFB0: .cfi_startproc vmovq i(%rip), %xmm16 vmovdqa64 %xmm16, %xmm17 <<< This is an AVX512VL instruction. vmovq %xmm17, %rax ret .cfi_endproc .LFE0: .size foo1, .-foo1 .ident "GCC: (GNU) 8.2.1 20190209 (Red Hat 8.2.1-8)" .section .note.GNU-stack,"",@progbits [hjl@gnu-4 gcc]$
(In reply to Jakub Jelinek from comment #14) > Comment on attachment 45685 [details] > I am testing this > > The movsi change doesn't look entirely right to me. While OImode or TImode > is not allowed in ext sse regs unless AVX512VL, that is not the case for > SImode, so for SImode if one or both operands are ext sse regs and > !TARGET_AVX512VL, we need to use MODE_XI and use the pattern with %g1, %g0 > in there. No need to set MODE_XI: if (EXT_REX_SSE_REG_P (operands[0]) || EXT_REX_SSE_REG_P (operands[1])) { if (TARGET_AVX512VL) return "vmovdqa32\t{%1, %0|%0, %1}"; else return "vmovdqa32\t{%g1, %0|%0, %g1}"; } else return "%vmovdqa\t{%1, %0|%0, %1}";
sse.md has (define_insn "mov<mode>_internal" [(set (match_operand:VMOVE 0 "nonimmediate_operand" "=v,v ,v ,m") (match_operand:VMOVE 1 "nonimmediate_or_sse_const_operand" " C,BC,vm,v"))] .... /* There is no evex-encoded vmov* for sizes smaller than 64-bytes in avx512f, so we need to use workarounds, to access sse registers 16-31, which are evex-only. In avx512vl we don't need workarounds. */ if (TARGET_AVX512F && <MODE_SIZE> < 64 && !TARGET_AVX512VL && (EXT_REX_SSE_REG_P (operands[0]) || EXT_REX_SSE_REG_P (operands[1]))) { if (memory_operand (operands[0], <MODE>mode)) { if (<MODE_SIZE> == 32) return "vextract<shuffletype>64x4\t{$0x0, %g1, %0|%0, %g1, 0x0}"; else if (<MODE_SIZE> == 16) return "vextract<shuffletype>32x4\t{$0x0, %g1, %0|%0, %g1, 0x0}"; else gcc_unreachable (); } ... However, ix86_hard_regno_mode_ok has /* TODO check for QI/HI scalars. */ /* AVX512VL allows sse regs16+ for 128/256 bit modes. */ if (TARGET_AVX512VL && (mode == OImode || mode == TImode || VALID_AVX256_REG_MODE (mode) || VALID_AVX512VL_128_REG_MODE (mode))) return true; /* xmm16-xmm31 are only available for AVX-512. */ if (EXT_REX_SSE_REGNO_P (regno)) return false; if (TARGET_AVX512F && <MODE_SIZE> < 64 && !TARGET_AVX512VL && (EXT_REX_SSE_REG_P (operands[0]) || EXT_REX_SSE_REG_P (operands[1]))) is a dead code: [hjl@gnu-4 gcc]$ cat /tmp/z.c #include <immintrin.h> extern __m128 i; __m128 foo1 (void) { register __m128 xmm16 __asm ("xmm16") = i; asm volatile ("" : "+v" (xmm16)); register __m128 xmm17 __asm ("xmm17") = xmm16; asm volatile ("" : "+v" (xmm17)); return xmm17; } [hjl@gnu-4 gcc]$ /usr/gcc-5.4.1-x32/bin/gcc -S -O2 -march=knl /tmp/z.c /tmp/z.c: In function ‘foo1’: /tmp/z.c:8:19: error: register specified for ‘xmm16’ isn’t suitable for data type register __m128 xmm16 __asm ("xmm16") = i; ^ /tmp/z.c:10:19: error: register specified for ‘xmm17’ isn’t suitable for data type register __m128 xmm17 __asm ("xmm17") = xmm16; ^ [hjl@gnu-4 gcc]$
Created attachment 45705 [details] An updated patch
Created attachment 45707 [details] A new patch
*** Bug 86896 has been marked as a duplicate of this bug. ***
A patch is posted at https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01841.html
Comment on attachment 45707 [details] A new patch >From fd7220a7551ee774614ca89574241813aae153b7 Mon Sep 17 00:00:00 2001 >From: "H.J. Lu" <hjl.tools@gmail.com> >Date: Tue, 12 Feb 2019 13:25:41 -0800 >Subject: [PATCH] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move > >i386 backend has > >INT_MODE (OI, 32); >INT_MODE (XI, 64); > >So, XI_MODE represents 64 INTEGER bytes = 64 * 8 = 512 bit operation, >in case of const_1, all 512 bits set. > >We can load zeros with narrower instruction, (e.g. 256 bit by inherent >zeroing of highpart in case of 128 bit xor), so TImode in this case. > >Some targets prefer V4SF mode, so they will emit float xorps for zeroing. > >sse.md has > >(define_insn "mov<mode>_internal" > [(set (match_operand:VMOVE 0 "nonimmediate_operand" > "=v,v ,v ,m") > (match_operand:VMOVE 1 "nonimmediate_or_sse_const_operand" > " C,BC,vm,v"))] >.... > /* There is no evex-encoded vmov* for sizes smaller than 64-bytes > in avx512f, so we need to use workarounds, to access sse registers > 16-31, which are evex-only. In avx512vl we don't need workarounds. */ > if (TARGET_AVX512F && <MODE_SIZE> < 64 && !TARGET_AVX512VL > && (EXT_REX_SSE_REG_P (operands[0]) > || EXT_REX_SSE_REG_P (operands[1]))) > { > if (memory_operand (operands[0], <MODE>mode)) > { > if (<MODE_SIZE> == 32) > return "vextract<shuffletype>64x4\t{$0x0, %g1, %0|%0, %g1, 0x0}"; > else if (<MODE_SIZE> == 16) > return "vextract<shuffletype>32x4\t{$0x0, %g1, %0|%0, %g1, 0x0}"; > else > gcc_unreachable (); > } >... > >However, since ix86_hard_regno_mode_ok has > > /* TODO check for QI/HI scalars. */ > /* AVX512VL allows sse regs16+ for 128/256 bit modes. */ > if (TARGET_AVX512VL > && (mode == OImode > || mode == TImode > || VALID_AVX256_REG_MODE (mode) > || VALID_AVX512VL_128_REG_MODE (mode))) > return true; > > /* xmm16-xmm31 are only available for AVX-512. */ > if (EXT_REX_SSE_REGNO_P (regno)) > return false; > > if (TARGET_AVX512F && <MODE_SIZE> < 64 && !TARGET_AVX512VL > && (EXT_REX_SSE_REG_P (operands[0]) > || EXT_REX_SSE_REG_P (operands[1]))) > >is a dead code. > >All TYPE_SSEMOV vector moves are consolidated to ix86_output_ssemov: > >1. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE/AVX vector >moves will be generated. >2. If xmm16-xmm31/ymm16-ymm31 registers are used: > a. With AVX512VL, AVX512VL vector moves will be generated. > b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register > move will be done with zmm register move. > >ext_sse_reg_operand is removed since it is no longer needed. > >gcc/ > > PR target/89229 > * config/i386/i386-protos.h (ix86_output_ssemov): New prototype. > * config/i386/i386.c (ix86_get_ssemov): New function. > (ix86_output_ssemov): Likewise. > * config/i386/i386.md (*movxi_internal_avx512f): Call > ix86_output_ssemov for TYPE_SSEMOV. > (*movoi_internal_avx): Call ix86_output_ssemov for TYPE_SSEMOV. > Remove ext_sse_reg_operand and TARGET_AVX512VL check. > (*movti_internal): Likewise. > (*movdi_internal): Call ix86_output_ssemov for TYPE_SSEMOV. > Remove ext_sse_reg_operand check. > (*movsi_internal): Likewise. > (*movtf_internal): Call ix86_output_ssemov for TYPE_SSEMOV. > (*movdf_internal): Call ix86_output_ssemov for TYPE_SSEMOV. > Remove TARGET_AVX512F, TARGET_PREFER_AVX256, TARGET_AVX512VL > and ext_sse_reg_operand check. > (*movsf_internal_avx): Call ix86_output_ssemov for TYPE_SSEMOV. > Remove TARGET_PREFER_AVX256, TARGET_AVX512VL and > ext_sse_reg_operand check. > * config/i386/mmx.md (MMXMODE:*mov<mode>_internal): Call > ix86_output_ssemov for TYPE_SSEMOV. Remove ext_sse_reg_operand > check. > * config/i386/sse.md (VMOVE:mov<mode>_internal): Call > ix86_output_ssemov for TYPE_SSEMOV. Remove TARGET_AVX512VL > check. > * config/i386/predicates.md (ext_sse_reg_operand): Removed. > >gcc/testsuite/ > > PR target/89229 > * gcc.target/i386/pr89229-2a.c: New test. > * gcc.target/i386/pr89229-2b.c: Likewise. > * gcc.target/i386/pr89229-2c.c: Likewise. > * gcc.target/i386/pr89229-3a.c: Likewise. > * gcc.target/i386/pr89229-3b.c: Likewise. > * gcc.target/i386/pr89229-3c.c: Likewise. > * gcc.target/i386/pr89229-4a.c: Likewise. > * gcc.target/i386/pr89229-4b.c: Likewise. > * gcc.target/i386/pr89229-4c.c: Likewise. > * gcc.target/i386/pr89229-5a.c: Likewise. > * gcc.target/i386/pr89229-5b.c: Likewise. > * gcc.target/i386/pr89229-5c.c: Likewise. > * gcc.target/i386/pr89229-6a.c: Likewise. > * gcc.target/i386/pr89229-6b.c: Likewise. > * gcc.target/i386/pr89229-6c.c: Likewise. > * gcc.target/i386/pr89229-7a.c: Likewise. > * gcc.target/i386/pr89229-7b.c: Likewise. > * gcc.target/i386/pr89229-7c.c: Likewise. >--- > gcc/config/i386/i386-protos.h | 2 + > gcc/config/i386/i386.c | 250 +++++++++++++++++++++ > gcc/config/i386/i386.md | 212 ++--------------- > gcc/config/i386/mmx.md | 29 +-- > gcc/config/i386/predicates.md | 5 - > gcc/config/i386/sse.md | 98 +------- > gcc/testsuite/gcc.target/i386/pr89229-2a.c | 15 ++ > gcc/testsuite/gcc.target/i386/pr89229-2b.c | 13 ++ > gcc/testsuite/gcc.target/i386/pr89229-2c.c | 6 + > gcc/testsuite/gcc.target/i386/pr89229-3a.c | 17 ++ > gcc/testsuite/gcc.target/i386/pr89229-3b.c | 6 + > gcc/testsuite/gcc.target/i386/pr89229-3c.c | 7 + > gcc/testsuite/gcc.target/i386/pr89229-4a.c | 17 ++ > gcc/testsuite/gcc.target/i386/pr89229-4b.c | 6 + > gcc/testsuite/gcc.target/i386/pr89229-4c.c | 7 + > gcc/testsuite/gcc.target/i386/pr89229-5a.c | 16 ++ > gcc/testsuite/gcc.target/i386/pr89229-5b.c | 6 + > gcc/testsuite/gcc.target/i386/pr89229-5c.c | 6 + > gcc/testsuite/gcc.target/i386/pr89229-6a.c | 16 ++ > gcc/testsuite/gcc.target/i386/pr89229-6b.c | 6 + > gcc/testsuite/gcc.target/i386/pr89229-6c.c | 6 + > gcc/testsuite/gcc.target/i386/pr89229-7a.c | 16 ++ > gcc/testsuite/gcc.target/i386/pr89229-7b.c | 12 + > gcc/testsuite/gcc.target/i386/pr89229-7c.c | 6 + > 24 files changed, 453 insertions(+), 327 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-2a.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-2b.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-2c.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-3a.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-3b.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-3c.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-4a.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-4b.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-4c.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-5a.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-5b.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-5c.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-6a.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-6b.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-6c.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-7a.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-7b.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-7c.c > >diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h >index 2d600173917..27f5cc13abf 100644 >--- a/gcc/config/i386/i386-protos.h >+++ b/gcc/config/i386/i386-protos.h >@@ -38,6 +38,8 @@ extern void ix86_expand_split_stack_prologue (void); > extern void ix86_output_addr_vec_elt (FILE *, int); > extern void ix86_output_addr_diff_elt (FILE *, int, int); > >+extern const char *ix86_output_ssemov (rtx_insn *, rtx *); >+ > extern enum calling_abi ix86_cfun_abi (void); > extern enum calling_abi ix86_function_type_abi (const_tree); > >diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >index fd05873ba39..97d1ea4229e 100644 >--- a/gcc/config/i386/i386.c >+++ b/gcc/config/i386/i386.c >@@ -10281,6 +10281,256 @@ ix86_standard_x87sse_constant_load_p (const rtx_insn *insn, rtx dst) > return true; > } > >+/* Return the opcode of the TYPE_SSEMOV instruction. To move from >+ or to xmm16-xmm31/ymm16-ymm31 registers, we either require >+ TARGET_AVX512VL or it is a register to register move which can >+ be done with zmm register move. */ >+ >+static const char * >+ix86_get_ssemov (rtx *operands, unsigned size, machine_mode mode) >+{ >+ static char buf[128]; >+ bool misaligned_p = (misaligned_operand (operands[0], mode) >+ || misaligned_operand (operands[1], mode)); >+ bool evex_reg_p = (EXT_REX_SSE_REG_P (operands[0]) >+ || EXT_REX_SSE_REG_P (operands[1])); >+ machine_mode scalar_mode = GET_MODE_INNER (mode); >+ >+ const char *opcode = NULL; >+ enum >+ { >+ opcode_int, >+ opcode_float, >+ opcode_double >+ } type = opcode_int; >+ if (SCALAR_FLOAT_MODE_P (scalar_mode)) >+ { >+ switch (scalar_mode) >+ { >+ case E_SFmode: >+ if (size == 64 || !evex_reg_p || TARGET_AVX512VL) >+ opcode = misaligned_p ? "%vmovups" : "%vmovaps"; >+ else >+ type = opcode_float; >+ break; >+ case E_DFmode: >+ if (size == 64 || !evex_reg_p || TARGET_AVX512VL) >+ opcode = misaligned_p ? "%vmovupd" : "%vmovapd"; >+ else >+ type = opcode_double; >+ break; >+ case E_TFmode: >+ if (size == 64) >+ opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64"; >+ else if (evex_reg_p) >+ { >+ if (TARGET_AVX512VL) >+ opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64"; >+ } >+ else >+ opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa"; >+ break; >+ default: >+ gcc_unreachable (); >+ } >+ } >+ else if (SCALAR_INT_MODE_P (scalar_mode)) >+ { >+ switch (scalar_mode) >+ { >+ case E_QImode: >+ if (size == 64) >+ opcode = (misaligned_p >+ ? (TARGET_AVX512BW >+ ? "vmovdqu8" >+ : "vmovdqu64") >+ : "vmovdqa64"); >+ else if (evex_reg_p) >+ { >+ if (TARGET_AVX512VL) >+ opcode = (misaligned_p >+ ? (TARGET_AVX512BW >+ ? "vmovdqu8" >+ : "vmovdqu64") >+ : "vmovdqa64"); >+ } >+ else >+ opcode = (misaligned_p >+ ? (TARGET_AVX512BW >+ ? "vmovdqu8" >+ : "%vmovdqu") >+ : "%vmovdqa"); >+ break; >+ case E_HImode: >+ if (size == 64) >+ opcode = (misaligned_p >+ ? (TARGET_AVX512BW >+ ? "vmovdqu16" >+ : "vmovdqu64") >+ : "vmovdqa64"); >+ else if (evex_reg_p) >+ { >+ if (TARGET_AVX512VL) >+ opcode = (misaligned_p >+ ? (TARGET_AVX512BW >+ ? "vmovdqu16" >+ : "vmovdqu64") >+ : "vmovdqa64"); >+ } >+ else >+ opcode = (misaligned_p >+ ? (TARGET_AVX512BW >+ ? "vmovdqu16" >+ : "%vmovdqu") >+ : "%vmovdqa"); >+ break; >+ case E_SImode: >+ if (size == 64) >+ opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32"; >+ else if (evex_reg_p) >+ { >+ if (TARGET_AVX512VL) >+ opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32"; >+ } >+ else >+ opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa"; >+ break; >+ case E_DImode: >+ case E_TImode: >+ case E_OImode: >+ if (size == 64) >+ opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64"; >+ else if (evex_reg_p) >+ { >+ if (TARGET_AVX512VL) >+ opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64"; >+ } >+ else >+ opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa"; >+ break; >+ case E_XImode: >+ opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64"; >+ break; >+ default: >+ gcc_unreachable (); >+ } >+ } >+ else >+ gcc_unreachable (); >+ >+ if (!opcode) >+ { >+ /* NB: We get here only because we move xmm16-xmm31/ymm16-ymm31 >+ registers without AVX512VL by using zmm register move. */ >+ if (!evex_reg_p >+ || TARGET_AVX512VL >+ || memory_operand (operands[0], mode) >+ || memory_operand (operands[1], mode)) >+ gcc_unreachable (); >+ size = 64; >+ switch (type) >+ { >+ case opcode_int: >+ opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32"; >+ break; >+ case opcode_float: >+ opcode = misaligned_p ? "%vmovups" : "%vmovaps"; >+ break; >+ case opcode_double: >+ opcode = misaligned_p ? "%vmovupd" : "%vmovapd"; >+ break; >+ } >+ } >+ >+ switch (size) >+ { >+ case 64: >+ snprintf (buf, sizeof (buf), "%s\t{%%g1, %%g0|%%g0, %%g1}", >+ opcode); >+ break; >+ case 32: >+ snprintf (buf, sizeof (buf), "%s\t{%%t1, %%t0|%%t0, %%t1}", >+ opcode); >+ break; >+ case 16: >+ snprintf (buf, sizeof (buf), "%s\t{%%x1, %%x0|%%x0, %%x1}", >+ opcode); >+ break; >+ default: >+ gcc_unreachable (); >+ } >+ return buf; >+} >+ >+/* Return the template of the TYPE_SSEMOV instruction to move >+ operands[1] into operands[0]. */ >+ >+const char * >+ix86_output_ssemov (rtx_insn *insn, rtx *operands) >+{ >+ machine_mode mode = GET_MODE (operands[0]); >+ if (get_attr_type (insn) != TYPE_SSEMOV >+ || mode != GET_MODE (operands[1])) >+ gcc_unreachable (); >+ >+ enum attr_mode insn_mode = get_attr_mode (insn); >+ >+ switch (insn_mode) >+ { >+ case MODE_XI: >+ case MODE_V8DF: >+ case MODE_V16SF: >+ return ix86_get_ssemov (operands, 64, mode); >+ >+ case MODE_OI: >+ case MODE_V4DF: >+ case MODE_V8SF: >+ return ix86_get_ssemov (operands, 32, mode); >+ >+ case MODE_TI: >+ case MODE_V2DF: >+ case MODE_V4SF: >+ return ix86_get_ssemov (operands, 16, mode); >+ >+ case MODE_DI: >+ /* Handle broken assemblers that require movd instead of movq. */ >+ if (!HAVE_AS_IX86_INTERUNIT_MOVQ >+ && (GENERAL_REG_P (operands[0]) >+ || GENERAL_REG_P (operands[1]))) >+ return "%vmovd\t{%1, %0|%0, %1}"; >+ else >+ return "%vmovq\t{%1, %0|%0, %1}"; >+ >+ case MODE_V2SF: >+ if (TARGET_AVX && REG_P (operands[0])) >+ return "vmovlps\t{%1, %d0|%d0, %1}"; >+ else >+ return "%vmovlps\t{%1, %0|%0, %1}"; >+ >+ case MODE_DF: >+ if (TARGET_AVX && REG_P (operands[0]) && REG_P (operands[1])) >+ return "vmovsd\t{%d1, %0|%0, %d1}"; >+ else >+ return "%vmovsd\t{%1, %0|%0, %1}"; >+ >+ case MODE_V1DF: >+ gcc_assert (!TARGET_AVX); >+ return "movlpd\t{%1, %0|%0, %1}"; >+ >+ case MODE_SI: >+ return "%vmovd\t{%1, %0|%0, %1}"; >+ >+ case MODE_SF: >+ if (TARGET_AVX && REG_P (operands[0]) && REG_P (operands[1])) >+ return "vmovss\t{%d1, %0|%0, %d1}"; >+ else >+ return "%vmovss\t{%1, %0|%0, %1}"; >+ >+ default: >+ gcc_unreachable (); >+ } >+} >+ > /* Returns true if OP contains a symbol reference */ > > bool >diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md >index 9948f77fca5..40ed93dc804 100644 >--- a/gcc/config/i386/i386.md >+++ b/gcc/config/i386/i386.md >@@ -1878,11 +1878,7 @@ > return standard_sse_constant_opcode (insn, operands); > > case TYPE_SSEMOV: >- if (misaligned_operand (operands[0], XImode) >- || misaligned_operand (operands[1], XImode)) >- return "vmovdqu32\t{%1, %0|%0, %1}"; >- else >- return "vmovdqa32\t{%1, %0|%0, %1}"; >+ return ix86_output_ssemov (insn, operands); > > default: > gcc_unreachable (); >@@ -1905,25 +1901,7 @@ > return standard_sse_constant_opcode (insn, operands); > > case TYPE_SSEMOV: >- if (misaligned_operand (operands[0], OImode) >- || misaligned_operand (operands[1], OImode)) >- { >- if (get_attr_mode (insn) == MODE_V8SF) >- return "vmovups\t{%1, %0|%0, %1}"; >- else if (get_attr_mode (insn) == MODE_XI) >- return "vmovdqu32\t{%1, %0|%0, %1}"; >- else >- return "vmovdqu\t{%1, %0|%0, %1}"; >- } >- else >- { >- if (get_attr_mode (insn) == MODE_V8SF) >- return "vmovaps\t{%1, %0|%0, %1}"; >- else if (get_attr_mode (insn) == MODE_XI) >- return "vmovdqa32\t{%1, %0|%0, %1}"; >- else >- return "vmovdqa\t{%1, %0|%0, %1}"; >- } >+ return ix86_output_ssemov (insn, operands); > > default: > gcc_unreachable (); >@@ -1933,13 +1911,7 @@ > (set_attr "type" "sselog1,sselog1,ssemov,ssemov") > (set_attr "prefix" "vex") > (set (attr "mode") >- (cond [(ior (match_operand 0 "ext_sse_reg_operand") >- (match_operand 1 "ext_sse_reg_operand")) >- (const_string "XI") >- (and (eq_attr "alternative" "1") >- (match_test "TARGET_AVX512VL")) >- (const_string "XI") >- (ior (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") >+ (cond [(ior (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") > (and (eq_attr "alternative" "3") > (match_test "TARGET_SSE_TYPELESS_STORES"))) > (const_string "V8SF") >@@ -1965,27 +1937,7 @@ > return standard_sse_constant_opcode (insn, operands); > > case TYPE_SSEMOV: >- /* TDmode values are passed as TImode on the stack. Moving them >- to stack may result in unaligned memory access. */ >- if (misaligned_operand (operands[0], TImode) >- || misaligned_operand (operands[1], TImode)) >- { >- if (get_attr_mode (insn) == MODE_V4SF) >- return "%vmovups\t{%1, %0|%0, %1}"; >- else if (get_attr_mode (insn) == MODE_XI) >- return "vmovdqu32\t{%1, %0|%0, %1}"; >- else >- return "%vmovdqu\t{%1, %0|%0, %1}"; >- } >- else >- { >- if (get_attr_mode (insn) == MODE_V4SF) >- return "%vmovaps\t{%1, %0|%0, %1}"; >- else if (get_attr_mode (insn) == MODE_XI) >- return "vmovdqa32\t{%1, %0|%0, %1}"; >- else >- return "%vmovdqa\t{%1, %0|%0, %1}"; >- } >+ return ix86_output_ssemov (insn, operands); > > default: > gcc_unreachable (); >@@ -2012,12 +1964,6 @@ > (set (attr "mode") > (cond [(eq_attr "alternative" "0,1") > (const_string "DI") >- (ior (match_operand 0 "ext_sse_reg_operand") >- (match_operand 1 "ext_sse_reg_operand")) >- (const_string "XI") >- (and (eq_attr "alternative" "3") >- (match_test "TARGET_AVX512VL")) >- (const_string "XI") > (ior (not (match_test "TARGET_SSE2")) > (ior (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") > (and (eq_attr "alternative" "5") >@@ -2091,31 +2037,7 @@ > return standard_sse_constant_opcode (insn, operands); > > case TYPE_SSEMOV: >- switch (get_attr_mode (insn)) >- { >- case MODE_DI: >- /* Handle broken assemblers that require movd instead of movq. */ >- if (!HAVE_AS_IX86_INTERUNIT_MOVQ >- && (GENERAL_REG_P (operands[0]) || GENERAL_REG_P (operands[1]))) >- return "%vmovd\t{%1, %0|%0, %1}"; >- return "%vmovq\t{%1, %0|%0, %1}"; >- >- case MODE_TI: >- /* Handle AVX512 registers set. */ >- if (EXT_REX_SSE_REG_P (operands[0]) >- || EXT_REX_SSE_REG_P (operands[1])) >- return "vmovdqa64\t{%1, %0|%0, %1}"; >- return "%vmovdqa\t{%1, %0|%0, %1}"; >- >- case MODE_V2SF: >- gcc_assert (!TARGET_AVX); >- return "movlps\t{%1, %0|%0, %1}"; >- case MODE_V4SF: >- return "%vmovaps\t{%1, %0|%0, %1}"; >- >- default: >- gcc_unreachable (); >- } >+ return ix86_output_ssemov (insn, operands); > > case TYPE_SSECVT: > if (SSE_REG_P (operands[0])) >@@ -2201,10 +2123,7 @@ > (cond [(eq_attr "alternative" "2") > (const_string "SI") > (eq_attr "alternative" "12,13") >- (cond [(ior (match_operand 0 "ext_sse_reg_operand") >- (match_operand 1 "ext_sse_reg_operand")) >- (const_string "TI") >- (ior (not (match_test "TARGET_SSE2")) >+ (cond [(ior (not (match_test "TARGET_SSE2")) > (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) > (const_string "V4SF") > (match_test "TARGET_AVX") >@@ -2327,25 +2246,7 @@ > gcc_unreachable (); > > case TYPE_SSEMOV: >- switch (get_attr_mode (insn)) >- { >- case MODE_SI: >- return "%vmovd\t{%1, %0|%0, %1}"; >- case MODE_TI: >- return "%vmovdqa\t{%1, %0|%0, %1}"; >- case MODE_XI: >- return "vmovdqa32\t{%g1, %g0|%g0, %g1}"; >- >- case MODE_V4SF: >- return "%vmovaps\t{%1, %0|%0, %1}"; >- >- case MODE_SF: >- gcc_assert (!TARGET_AVX); >- return "movss\t{%1, %0|%0, %1}"; >- >- default: >- gcc_unreachable (); >- } >+ return ix86_output_ssemov (insn, operands); > > case TYPE_MMX: > return "pxor\t%0, %0"; >@@ -2411,10 +2312,7 @@ > (cond [(eq_attr "alternative" "2,3") > (const_string "DI") > (eq_attr "alternative" "8,9") >- (cond [(ior (match_operand 0 "ext_sse_reg_operand") >- (match_operand 1 "ext_sse_reg_operand")) >- (const_string "XI") >- (ior (not (match_test "TARGET_SSE2")) >+ (cond [(ior (not (match_test "TARGET_SSE2")) > (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) > (const_string "V4SF") > (match_test "TARGET_AVX") >@@ -3234,31 +3132,7 @@ > return standard_sse_constant_opcode (insn, operands); > > case TYPE_SSEMOV: >- /* Handle misaligned load/store since we >- don't have movmisaligntf pattern. */ >- if (misaligned_operand (operands[0], TFmode) >- || misaligned_operand (operands[1], TFmode)) >- { >- if (get_attr_mode (insn) == MODE_V4SF) >- return "%vmovups\t{%1, %0|%0, %1}"; >- else if (TARGET_AVX512VL >- && (EXT_REX_SSE_REG_P (operands[0]) >- || EXT_REX_SSE_REG_P (operands[1]))) >- return "vmovdqu64\t{%1, %0|%0, %1}"; >- else >- return "%vmovdqu\t{%1, %0|%0, %1}"; >- } >- else >- { >- if (get_attr_mode (insn) == MODE_V4SF) >- return "%vmovaps\t{%1, %0|%0, %1}"; >- else if (TARGET_AVX512VL >- && (EXT_REX_SSE_REG_P (operands[0]) >- || EXT_REX_SSE_REG_P (operands[1]))) >- return "vmovdqa64\t{%1, %0|%0, %1}"; >- else >- return "%vmovdqa\t{%1, %0|%0, %1}"; >- } >+ return ix86_output_ssemov (insn, operands); > > case TYPE_MULTI: > return "#"; >@@ -3411,37 +3285,7 @@ > return standard_sse_constant_opcode (insn, operands); > > case TYPE_SSEMOV: >- switch (get_attr_mode (insn)) >- { >- case MODE_DF: >- if (TARGET_AVX && REG_P (operands[0]) && REG_P (operands[1])) >- return "vmovsd\t{%d1, %0|%0, %d1}"; >- return "%vmovsd\t{%1, %0|%0, %1}"; >- >- case MODE_V4SF: >- return "%vmovaps\t{%1, %0|%0, %1}"; >- case MODE_V8DF: >- return "vmovapd\t{%g1, %g0|%g0, %g1}"; >- case MODE_V2DF: >- return "%vmovapd\t{%1, %0|%0, %1}"; >- >- case MODE_V2SF: >- gcc_assert (!TARGET_AVX); >- return "movlps\t{%1, %0|%0, %1}"; >- case MODE_V1DF: >- gcc_assert (!TARGET_AVX); >- return "movlpd\t{%1, %0|%0, %1}"; >- >- case MODE_DI: >- /* Handle broken assemblers that require movd instead of movq. */ >- if (!HAVE_AS_IX86_INTERUNIT_MOVQ >- && (GENERAL_REG_P (operands[0]) || GENERAL_REG_P (operands[1]))) >- return "%vmovd\t{%1, %0|%0, %1}"; >- return "%vmovq\t{%1, %0|%0, %1}"; >- >- default: >- gcc_unreachable (); >- } >+ return ix86_output_ssemov (insn, operands); > > default: > gcc_unreachable (); >@@ -3497,9 +3341,6 @@ > (eq_attr "alternative" "12,16") > (cond [(not (match_test "TARGET_SSE2")) > (const_string "V4SF") >- (and (match_test "TARGET_AVX512F") >- (not (match_test "TARGET_PREFER_AVX256"))) >- (const_string "XI") > (match_test "TARGET_AVX") > (const_string "V2DF") > (match_test "optimize_function_for_size_p (cfun)") >@@ -3515,12 +3356,7 @@ > > /* movaps is one byte shorter for non-AVX targets. */ > (eq_attr "alternative" "13,17") >- (cond [(and (ior (not (match_test "TARGET_PREFER_AVX256")) >- (not (match_test "TARGET_AVX512VL"))) >- (ior (match_operand 0 "ext_sse_reg_operand") >- (match_operand 1 "ext_sse_reg_operand"))) >- (const_string "V8DF") >- (ior (not (match_test "TARGET_SSE2")) >+ (cond [(ior (not (match_test "TARGET_SSE2")) > (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) > (const_string "V4SF") > (match_test "TARGET_SSE_PARTIAL_REG_DEPENDENCY") >@@ -3612,24 +3448,7 @@ > return standard_sse_constant_opcode (insn, operands); > > case TYPE_SSEMOV: >- switch (get_attr_mode (insn)) >- { >- case MODE_SF: >- if (TARGET_AVX && REG_P (operands[0]) && REG_P (operands[1])) >- return "vmovss\t{%d1, %0|%0, %d1}"; >- return "%vmovss\t{%1, %0|%0, %1}"; >- >- case MODE_V16SF: >- return "vmovaps\t{%g1, %g0|%g0, %g1}"; >- case MODE_V4SF: >- return "%vmovaps\t{%1, %0|%0, %1}"; >- >- case MODE_SI: >- return "%vmovd\t{%1, %0|%0, %1}"; >- >- default: >- gcc_unreachable (); >- } >+ return ix86_output_ssemov (insn, operands); > > case TYPE_MMXMOV: > switch (get_attr_mode (insn)) >@@ -3702,12 +3521,7 @@ > better to maintain the whole registers in single format > to avoid problems on using packed logical operations. */ > (eq_attr "alternative" "6") >- (cond [(and (ior (not (match_test "TARGET_PREFER_AVX256")) >- (not (match_test "TARGET_AVX512VL"))) >- (ior (match_operand 0 "ext_sse_reg_operand") >- (match_operand 1 "ext_sse_reg_operand"))) >- (const_string "V16SF") >- (ior (match_test "TARGET_SSE_PARTIAL_REG_DEPENDENCY") >+ (cond [(ior (match_test "TARGET_SSE_PARTIAL_REG_DEPENDENCY") > (match_test "TARGET_SSE_SPLIT_REGS")) > (const_string "V4SF") > ] >diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md >index c1e0f2c411e..9c3808338d3 100644 >--- a/gcc/config/i386/mmx.md >+++ b/gcc/config/i386/mmx.md >@@ -115,29 +115,7 @@ > return standard_sse_constant_opcode (insn, operands); > > case TYPE_SSEMOV: >- switch (get_attr_mode (insn)) >- { >- case MODE_DI: >- /* Handle broken assemblers that require movd instead of movq. */ >- if (!HAVE_AS_IX86_INTERUNIT_MOVQ >- && (GENERAL_REG_P (operands[0]) || GENERAL_REG_P (operands[1]))) >- return "%vmovd\t{%1, %0|%0, %1}"; >- return "%vmovq\t{%1, %0|%0, %1}"; >- case MODE_TI: >- return "%vmovdqa\t{%1, %0|%0, %1}"; >- case MODE_XI: >- return "vmovdqa64\t{%g1, %g0|%g0, %g1}"; >- >- case MODE_V2SF: >- if (TARGET_AVX && REG_P (operands[0])) >- return "vmovlps\t{%1, %0, %0|%0, %0, %1}"; >- return "%vmovlps\t{%1, %0|%0, %1}"; >- case MODE_V4SF: >- return "%vmovaps\t{%1, %0|%0, %1}"; >- >- default: >- gcc_unreachable (); >- } >+ return ix86_output_ssemov (insn, operands); > > default: > gcc_unreachable (); >@@ -186,10 +164,7 @@ > (cond [(eq_attr "alternative" "2") > (const_string "SI") > (eq_attr "alternative" "11,12") >- (cond [(ior (match_operand 0 "ext_sse_reg_operand") >- (match_operand 1 "ext_sse_reg_operand")) >- (const_string "XI") >- (match_test "<MODE>mode == V2SFmode") >+ (cond [(match_test "<MODE>mode == V2SFmode") > (const_string "V4SF") > (ior (not (match_test "TARGET_SSE2")) > (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) >diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md >index 865947debcc..99226e86436 100644 >--- a/gcc/config/i386/predicates.md >+++ b/gcc/config/i386/predicates.md >@@ -54,11 +54,6 @@ > (and (match_code "reg") > (match_test "SSE_REGNO_P (REGNO (op))"))) > >-;; True if the operand is an AVX-512 new register. >-(define_predicate "ext_sse_reg_operand" >- (and (match_code "reg") >- (match_test "EXT_REX_SSE_REGNO_P (REGNO (op))"))) >- > ;; Return true if op is a QImode register. > (define_predicate "any_QIreg_operand" > (and (match_code "reg") >diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md >index 5dc0930ac1f..2014f0a7832 100644 >--- a/gcc/config/i386/sse.md >+++ b/gcc/config/i386/sse.md >@@ -982,98 +982,7 @@ > return standard_sse_constant_opcode (insn, operands); > > case TYPE_SSEMOV: >- /* There is no evex-encoded vmov* for sizes smaller than 64-bytes >- in avx512f, so we need to use workarounds, to access sse registers >- 16-31, which are evex-only. In avx512vl we don't need workarounds. */ >- if (TARGET_AVX512F && <MODE_SIZE> < 64 && !TARGET_AVX512VL >- && (EXT_REX_SSE_REG_P (operands[0]) >- || EXT_REX_SSE_REG_P (operands[1]))) >- { >- if (memory_operand (operands[0], <MODE>mode)) >- { >- if (<MODE_SIZE> == 32) >- return "vextract<shuffletype>64x4\t{$0x0, %g1, %0|%0, %g1, 0x0}"; >- else if (<MODE_SIZE> == 16) >- return "vextract<shuffletype>32x4\t{$0x0, %g1, %0|%0, %g1, 0x0}"; >- else >- gcc_unreachable (); >- } >- else if (memory_operand (operands[1], <MODE>mode)) >- { >- if (<MODE_SIZE> == 32) >- return "vbroadcast<shuffletype>64x4\t{%1, %g0|%g0, %1}"; >- else if (<MODE_SIZE> == 16) >- return "vbroadcast<shuffletype>32x4\t{%1, %g0|%g0, %1}"; >- else >- gcc_unreachable (); >- } >- else >- /* Reg -> reg move is always aligned. Just use wider move. */ >- switch (get_attr_mode (insn)) >- { >- case MODE_V8SF: >- case MODE_V4SF: >- return "vmovaps\t{%g1, %g0|%g0, %g1}"; >- case MODE_V4DF: >- case MODE_V2DF: >- return "vmovapd\t{%g1, %g0|%g0, %g1}"; >- case MODE_OI: >- case MODE_TI: >- return "vmovdqa64\t{%g1, %g0|%g0, %g1}"; >- default: >- gcc_unreachable (); >- } >- } >- >- switch (get_attr_mode (insn)) >- { >- case MODE_V16SF: >- case MODE_V8SF: >- case MODE_V4SF: >- if (misaligned_operand (operands[0], <MODE>mode) >- || misaligned_operand (operands[1], <MODE>mode)) >- return "%vmovups\t{%1, %0|%0, %1}"; >- else >- return "%vmovaps\t{%1, %0|%0, %1}"; >- >- case MODE_V8DF: >- case MODE_V4DF: >- case MODE_V2DF: >- if (misaligned_operand (operands[0], <MODE>mode) >- || misaligned_operand (operands[1], <MODE>mode)) >- return "%vmovupd\t{%1, %0|%0, %1}"; >- else >- return "%vmovapd\t{%1, %0|%0, %1}"; >- >- case MODE_OI: >- case MODE_TI: >- if (misaligned_operand (operands[0], <MODE>mode) >- || misaligned_operand (operands[1], <MODE>mode)) >- return TARGET_AVX512VL >- && (<MODE>mode == V4SImode >- || <MODE>mode == V2DImode >- || <MODE>mode == V8SImode >- || <MODE>mode == V4DImode >- || TARGET_AVX512BW) >- ? "vmovdqu<ssescalarsize>\t{%1, %0|%0, %1}" >- : "%vmovdqu\t{%1, %0|%0, %1}"; >- else >- return TARGET_AVX512VL ? "vmovdqa64\t{%1, %0|%0, %1}" >- : "%vmovdqa\t{%1, %0|%0, %1}"; >- case MODE_XI: >- if (misaligned_operand (operands[0], <MODE>mode) >- || misaligned_operand (operands[1], <MODE>mode)) >- return (<MODE>mode == V16SImode >- || <MODE>mode == V8DImode >- || TARGET_AVX512BW) >- ? "vmovdqu<ssescalarsize>\t{%1, %0|%0, %1}" >- : "vmovdqu64\t{%1, %0|%0, %1}"; >- else >- return "vmovdqa64\t{%1, %0|%0, %1}"; >- >- default: >- gcc_unreachable (); >- } >+ return ix86_output_ssemov (insn, operands); > > default: > gcc_unreachable (); >@@ -1082,10 +991,7 @@ > [(set_attr "type" "sselog1,sselog1,ssemov,ssemov") > (set_attr "prefix" "maybe_vex") > (set (attr "mode") >- (cond [(and (eq_attr "alternative" "1") >- (match_test "TARGET_AVX512VL")) >- (const_string "<sseinsnmode>") >- (and (match_test "<MODE_SIZE> == 16") >+ (cond [(and (match_test "<MODE_SIZE> == 16") > (ior (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") > (and (eq_attr "alternative" "3") > (match_test "TARGET_SSE_TYPELESS_STORES")))) >diff --git a/gcc/testsuite/gcc.target/i386/pr89229-2a.c b/gcc/testsuite/gcc.target/i386/pr89229-2a.c >new file mode 100644 >index 00000000000..0cf78039481 >--- /dev/null >+++ b/gcc/testsuite/gcc.target/i386/pr89229-2a.c >@@ -0,0 +1,15 @@ >+/* { dg-do compile { target { ! ia32 } } } */ >+/* { dg-options "-O2 -march=skylake-avx512" } */ >+ >+typedef __int128 __m128t __attribute__ ((__vector_size__ (16), >+ __may_alias__)); >+ >+__m128t >+foo1 (void) >+{ >+ register __int128 xmm16 __asm ("xmm16") = (__int128) -1; >+ asm volatile ("" : "+v" (xmm16)); >+ return (__m128t) xmm16; >+} >+ >+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */ >diff --git a/gcc/testsuite/gcc.target/i386/pr89229-2b.c b/gcc/testsuite/gcc.target/i386/pr89229-2b.c >new file mode 100644 >index 00000000000..8d5d6c41d30 >--- /dev/null >+++ b/gcc/testsuite/gcc.target/i386/pr89229-2b.c >@@ -0,0 +1,13 @@ >+/* { dg-do compile { target { ! ia32 } } } */ >+/* { dg-options "-O2 -march=skylake-avx512 -mno-avx512vl" } */ >+ >+typedef __int128 __m128t __attribute__ ((__vector_size__ (16), >+ __may_alias__)); >+ >+__m128t >+foo1 (void) >+{ >+ register __int128 xmm16 __asm ("xmm16") = (__int128) -1; /* { dg-error "register specified for 'xmm16'" } */ >+ asm volatile ("" : "+v" (xmm16)); >+ return (__m128t) xmm16; >+} >diff --git a/gcc/testsuite/gcc.target/i386/pr89229-2c.c b/gcc/testsuite/gcc.target/i386/pr89229-2c.c >new file mode 100644 >index 00000000000..218da46dcd0 >--- /dev/null >+++ b/gcc/testsuite/gcc.target/i386/pr89229-2c.c >@@ -0,0 +1,6 @@ >+/* { dg-do compile { target { ! ia32 } } } */ >+/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */ >+ >+#include "pr89229-2a.c" >+ >+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */ >diff --git a/gcc/testsuite/gcc.target/i386/pr89229-3a.c b/gcc/testsuite/gcc.target/i386/pr89229-3a.c >new file mode 100644 >index 00000000000..fd56f447016 >--- /dev/null >+++ b/gcc/testsuite/gcc.target/i386/pr89229-3a.c >@@ -0,0 +1,17 @@ >+/* { dg-do compile { target { ! ia32 } } } */ >+/* { dg-options "-O2 -march=skylake-avx512" } */ >+ >+extern int i; >+ >+int >+foo1 (void) >+{ >+ register int xmm16 __asm ("xmm16") = i; >+ asm volatile ("" : "+v" (xmm16)); >+ register int xmm17 __asm ("xmm17") = xmm16; >+ asm volatile ("" : "+v" (xmm17)); >+ return xmm17; >+} >+ >+/* { dg-final { scan-assembler-times "vmovdqa32\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */ >+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */ >diff --git a/gcc/testsuite/gcc.target/i386/pr89229-3b.c b/gcc/testsuite/gcc.target/i386/pr89229-3b.c >new file mode 100644 >index 00000000000..9265fc0354b >--- /dev/null >+++ b/gcc/testsuite/gcc.target/i386/pr89229-3b.c >@@ -0,0 +1,6 @@ >+/* { dg-do compile { target { ! ia32 } } } */ >+/* { dg-options "-O2 -march=skylake-avx512 -mno-avx512vl" } */ >+ >+#include "pr89229-3a.c" >+ >+/* { dg-final { scan-assembler-times "vmovdqa32\[^\n\r]*zmm1\[67]\[^\n\r]*zmm1\[67]" 1 } } */ >diff --git a/gcc/testsuite/gcc.target/i386/pr89229-3c.c b/gcc/testsuite/gcc.target/i386/pr89229-3c.c >new file mode 100644 >index 00000000000..d3fdf1ee273 >--- /dev/null >+++ b/gcc/testsuite/gcc.target/i386/pr89229-3c.c >@@ -0,0 +1,7 @@ >+/* { dg-do compile { target { ! ia32 } } } */ >+/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */ >+ >+#include "pr89229-3a.c" >+ >+/* { dg-final { scan-assembler-times "vmovdqa32\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */ >+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */ >diff --git a/gcc/testsuite/gcc.target/i386/pr89229-4a.c b/gcc/testsuite/gcc.target/i386/pr89229-4a.c >new file mode 100644 >index 00000000000..cb9b071e873 >--- /dev/null >+++ b/gcc/testsuite/gcc.target/i386/pr89229-4a.c >@@ -0,0 +1,17 @@ >+/* { dg-do compile { target { ! ia32 } } } */ >+/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */ >+ >+extern long long i; >+ >+long long >+foo1 (void) >+{ >+ register long long xmm16 __asm ("xmm16") = i; >+ asm volatile ("" : "+v" (xmm16)); >+ register long long xmm17 __asm ("xmm17") = xmm16; >+ asm volatile ("" : "+v" (xmm17)); >+ return xmm17; >+} >+ >+/* { dg-final { scan-assembler-times "vmovdqa64\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */ >+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */ >diff --git a/gcc/testsuite/gcc.target/i386/pr89229-4b.c b/gcc/testsuite/gcc.target/i386/pr89229-4b.c >new file mode 100644 >index 00000000000..023e81253a0 >--- /dev/null >+++ b/gcc/testsuite/gcc.target/i386/pr89229-4b.c >@@ -0,0 +1,6 @@ >+/* { dg-do compile { target { ! ia32 } } } */ >+/* { dg-options "-O2 -march=skylake-avx512 -mno-avx512vl" } */ >+ >+#include "pr89229-4a.c" >+ >+/* { dg-final { scan-assembler-times "vmovdqa32\[^\n\r]*zmm1\[67]\[^\n\r]*zmm1\[67]" 1 } } */ >diff --git a/gcc/testsuite/gcc.target/i386/pr89229-4c.c b/gcc/testsuite/gcc.target/i386/pr89229-4c.c >new file mode 100644 >index 00000000000..e02eb37c16d >--- /dev/null >+++ b/gcc/testsuite/gcc.target/i386/pr89229-4c.c >@@ -0,0 +1,7 @@ >+/* { dg-do compile { target { ! ia32 } } } */ >+/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */ >+ >+#include "pr89229-4a.c" >+ >+/* { dg-final { scan-assembler-times "vmovdqa64\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */ >+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */ >diff --git a/gcc/testsuite/gcc.target/i386/pr89229-5a.c b/gcc/testsuite/gcc.target/i386/pr89229-5a.c >new file mode 100644 >index 00000000000..856115b2f5a >--- /dev/null >+++ b/gcc/testsuite/gcc.target/i386/pr89229-5a.c >@@ -0,0 +1,16 @@ >+/* { dg-do compile { target { ! ia32 } } } */ >+/* { dg-options "-O2 -march=skylake-avx512" } */ >+ >+extern float d; >+ >+void >+foo1 (float x) >+{ >+ register float xmm16 __asm ("xmm16") = x; >+ asm volatile ("" : "+v" (xmm16)); >+ register float xmm17 __asm ("xmm17") = xmm16; >+ asm volatile ("" : "+v" (xmm17)); >+ d = xmm17; >+} >+ >+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */ >diff --git a/gcc/testsuite/gcc.target/i386/pr89229-5b.c b/gcc/testsuite/gcc.target/i386/pr89229-5b.c >new file mode 100644 >index 00000000000..cb0f3b55ccc >--- /dev/null >+++ b/gcc/testsuite/gcc.target/i386/pr89229-5b.c >@@ -0,0 +1,6 @@ >+/* { dg-do compile { target { ! ia32 } } } */ >+/* { dg-options "-O2 -march=skylake-avx512 -mno-avx512vl" } */ >+ >+#include "pr89229-5a.c" >+ >+/* { dg-final { scan-assembler-times "vmovaps\[^\n\r]*zmm1\[67]\[^\n\r]*zmm1\[67]" 1 } } */ >diff --git a/gcc/testsuite/gcc.target/i386/pr89229-5c.c b/gcc/testsuite/gcc.target/i386/pr89229-5c.c >new file mode 100644 >index 00000000000..529a520133c >--- /dev/null >+++ b/gcc/testsuite/gcc.target/i386/pr89229-5c.c >@@ -0,0 +1,6 @@ >+/* { dg-do compile { target { ! ia32 } } } */ >+/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */ >+ >+#include "pr89229-5a.c" >+ >+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */ >diff --git a/gcc/testsuite/gcc.target/i386/pr89229-6a.c b/gcc/testsuite/gcc.target/i386/pr89229-6a.c >new file mode 100644 >index 00000000000..f88d7c8d74c >--- /dev/null >+++ b/gcc/testsuite/gcc.target/i386/pr89229-6a.c >@@ -0,0 +1,16 @@ >+/* { dg-do compile { target { ! ia32 } } } */ >+/* { dg-options "-O2 -march=skylake-avx512" } */ >+ >+extern double d; >+ >+void >+foo1 (double x) >+{ >+ register double xmm16 __asm ("xmm16") = x; >+ asm volatile ("" : "+v" (xmm16)); >+ register double xmm17 __asm ("xmm17") = xmm16; >+ asm volatile ("" : "+v" (xmm17)); >+ d = xmm17; >+} >+ >+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */ >diff --git a/gcc/testsuite/gcc.target/i386/pr89229-6b.c b/gcc/testsuite/gcc.target/i386/pr89229-6b.c >new file mode 100644 >index 00000000000..316d85d921e >--- /dev/null >+++ b/gcc/testsuite/gcc.target/i386/pr89229-6b.c >@@ -0,0 +1,6 @@ >+/* { dg-do compile { target { ! ia32 } } } */ >+/* { dg-options "-O2 -march=skylake-avx512 -mno-avx512vl" } */ >+ >+#include "pr89229-6a.c" >+ >+/* { dg-final { scan-assembler-times "vmovapd\[^\n\r]*zmm1\[67]\[^\n\r]*zmm1\[67]" 1 } } */ >diff --git a/gcc/testsuite/gcc.target/i386/pr89229-6c.c b/gcc/testsuite/gcc.target/i386/pr89229-6c.c >new file mode 100644 >index 00000000000..7a4d254670c >--- /dev/null >+++ b/gcc/testsuite/gcc.target/i386/pr89229-6c.c >@@ -0,0 +1,6 @@ >+/* { dg-do compile { target { ! ia32 } } } */ >+/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */ >+ >+#include "pr89229-6a.c" >+ >+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */ >diff --git a/gcc/testsuite/gcc.target/i386/pr89229-7a.c b/gcc/testsuite/gcc.target/i386/pr89229-7a.c >new file mode 100644 >index 00000000000..fcb85c366b6 >--- /dev/null >+++ b/gcc/testsuite/gcc.target/i386/pr89229-7a.c >@@ -0,0 +1,16 @@ >+/* { dg-do compile { target { ! ia32 } } } */ >+/* { dg-options "-O2 -march=skylake-avx512" } */ >+ >+extern __float128 d; >+ >+void >+foo1 (__float128 x) >+{ >+ register __float128 xmm16 __asm ("xmm16") = x; >+ asm volatile ("" : "+v" (xmm16)); >+ register __float128 xmm17 __asm ("xmm17") = xmm16; >+ asm volatile ("" : "+v" (xmm17)); >+ d = xmm17; >+} >+ >+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */ >diff --git a/gcc/testsuite/gcc.target/i386/pr89229-7b.c b/gcc/testsuite/gcc.target/i386/pr89229-7b.c >new file mode 100644 >index 00000000000..37eb83c783b >--- /dev/null >+++ b/gcc/testsuite/gcc.target/i386/pr89229-7b.c >@@ -0,0 +1,12 @@ >+/* { dg-do compile { target { ! ia32 } } } */ >+/* { dg-options "-O2 -march=skylake-avx512 -mno-avx512vl" } */ >+ >+extern __float128 d; >+ >+void >+foo1 (__float128 x) >+{ >+ register __float128 xmm16 __asm ("xmm16") = x; /* { dg-error "register specified for 'xmm16'" } */ >+ asm volatile ("" : "+v" (xmm16)); >+ d = xmm16; >+} >diff --git a/gcc/testsuite/gcc.target/i386/pr89229-7c.c b/gcc/testsuite/gcc.target/i386/pr89229-7c.c >new file mode 100644 >index 00000000000..e37ff2bf5bd >--- /dev/null >+++ b/gcc/testsuite/gcc.target/i386/pr89229-7c.c >@@ -0,0 +1,6 @@ >+/* { dg-do compile { target { ! ia32 } } } */ >+/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */ >+ >+#include "pr89229-7a.c" >+ >+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */ >-- >2.20.1 >
GCC 9.1 has been released.
GCC 9.2 has been released.
*** Bug 89346 has been marked as a duplicate of this bug. ***
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>: https://gcc.gnu.org/g:5358e8f5800daa0012fc9d06705d64bbb21fa07b commit r10-7054-g5358e8f5800daa0012fc9d06705d64bbb21fa07b Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 5 16:45:05 2020 -0800 i386: Properly encode vector registers in vector move On x86, when AVX and AVX512 are enabled, vector move instructions can be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512): 0: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 4: 62 f1 fd 08 6f d1 vmovdqa64 %xmm1,%xmm2 We prefer VEX encoding over EVEX since VEX is shorter. Also AVX512F only supports 512-bit vector moves. AVX512F + AVX512VL supports 128-bit and 256-bit vector moves. xmm16-xmm31 and ymm16-ymm31 are disallowed in 128-bit and 256-bit modes when AVX512VL is disabled. Mode attributes on x86 vector move patterns indicate target preferences of vector move encoding. For scalar register to register move, we can use 512-bit vector move instructions to move 32-bit/64-bit scalar if AVX512VL isn't available. With AVX512F and AVX512VL, we should use VEX encoding for 128-bit/256-bit vector moves if upper 16 vector registers aren't used. This patch adds a function, ix86_output_ssemov, to generate vector moves: 1. If zmm registers are used, use EVEX encoding. 2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding will be generated. 3. If xmm16-xmm31/ymm16-ymm31 registers are used: a. With AVX512VL, AVX512VL vector moves will be generated. b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register move will be done with zmm register move. There is no need to set mode attribute to XImode explicitly since ix86_output_ssemov can properly encode xmm16-xmm31/ymm16-ymm31 registers with and without AVX512VL. Tested on AVX2 and AVX512 with and without --with-arch=native. gcc/ PR target/89229 PR target/89346 * config/i386/i386-protos.h (ix86_output_ssemov): New prototype. * config/i386/i386.c (ix86_get_ssemov): New function. (ix86_output_ssemov): Likewise. * config/i386/sse.md (VMOVE:mov<mode>_internal): Call ix86_output_ssemov for TYPE_SSEMOV. Remove TARGET_AVX512VL check. (*movxi_internal_avx512f): Call ix86_output_ssemov for TYPE_SSEMOV. (*movoi_internal_avx): Call ix86_output_ssemov for TYPE_SSEMOV. Remove ext_sse_reg_operand and TARGET_AVX512VL check. (*movti_internal): Likewise. (*movtf_internal): Call ix86_output_ssemov for TYPE_SSEMOV. gcc/testsuite/ PR target/89229 PR target/89346 * gcc.target/i386/avx512vl-vmovdqa64-1.c: Updated. * gcc.target/i386/pr89229-2a.c: New test. * gcc.target/i386/pr89229-2b.c: Likewise. * gcc.target/i386/pr89229-2c.c: Likewise. * gcc.target/i386/pr89229-3a.c: Likewise. * gcc.target/i386/pr89229-3b.c: Likewise. * gcc.target/i386/pr89229-3c.c: Likewise. * gcc.target/i386/pr89346.c: Likewise.
commit r10-7078-g6733ecaf3fe77871d86bfb36bcda5497ae2aaba7 Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Mar 8 05:01:03 2020 -0700 gcc.target/i386/pr89229-3c.c: Include "pr89229-3a.c" PR target/89229 PR target/89346 * gcc.target/i386/pr89229-3c.c: Include "pr89229-3a.c", instead of "pr89229-5a.c".
commit r10-7143-g54f46d82f54ba7a4110cef102b7c18eaf8b4b6bd Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Mar 12 03:47:45 2020 -0700 i386: Use ix86_output_ssemov for MMX TYPE_SSEMOV There is no need to set mode attribute to XImode since ix86_output_ssemov can properly encode xmm16-xmm31 registers with and without AVX512VL. PR target/89229 * config/i386/i386.c (ix86_output_ssemov): Handle MODE_DI, MODE_V1DF and MODE_V2SF. * config/i386/mmx.md (MMXMODE:*mov<mode>_internal): Call ix86_output_ssemov for TYPE_SSEMOV. Remove ext_sse_reg_operand check.
commit r10-7154-gfd8679974b2ded884ffd7d912efef7fe13e4ff4f Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Mar 13 02:48:59 2020 -0700 i386: Use ix86_output_ssemov for DFmode TYPE_SSEMOV There is no need to set mode attribute to XImode nor V8DFmode since ix86_output_ssemov can properly encode xmm16-xmm31 registers with and without AVX512VL. gcc/ PR target/89229 * config/i386/i386.c (ix86_output_ssemov): Handle MODE_DF. * config/i386/i386.md (*movdf_internal): Call ix86_output_ssemov for TYPE_SSEMOV. Remove TARGET_AVX512F, TARGET_PREFER_AVX256, TARGET_AVX512VL and ext_sse_reg_operand check. gcc/testsuite/ PR target/89229 * gcc.target/i386/pr89229-4a.c: New test. * gcc.target/i386/pr89229-4b.c: Likewise. * gcc.target/i386/pr89229-4c.c: Likewise.
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>: https://gcc.gnu.org/g:824722e45f80b22e2f035a61300f494b2a10d6f4 commit r10-7177-g824722e45f80b22e2f035a61300f494b2a10d6f4 Author: H.J. Lu <hjl.tools@gmail.com> Date: Sat Mar 14 16:06:55 2020 -0700 i386: Use ix86_output_ssemov for DImode TYPE_SSEMOV There is no need to set mode attribute to XImode since ix86_output_ssemov can properly encode xmm16-xmm31 registers with and without AVX512VL. gcc/ PR target/89229 * config/i386/i386.md (*movdi_internal): Call ix86_output_ssemov for TYPE_SSEMOV. Remove ext_sse_reg_operand and TARGET_AVX512VL check. gcc/testsuite/ PR target/89229 * gcc.target/i386/pr89229-5a.c: New test. * gcc.target/i386/pr89229-5b.c: Likewise. * gcc.target/i386/pr89229-5c.c: Likewise.
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>: https://gcc.gnu.org/g:9d74caf21be7025db8fef997e87ebf3b85acaf4a commit r10-7182-g9d74caf21be7025db8fef997e87ebf3b85acaf4a Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Mar 15 10:21:08 2020 -0700 i386: Use ix86_output_ssemov for SFmode TYPE_SSEMOV There is no need to set mode attribute to V16SFmode since ix86_output_ssemov can properly encode xmm16-xmm31 registers with and without AVX512VL. gcc/ PR target/89229 * config/i386/i386.c (ix86_output_ssemov): Handle MODE_SI and MODE_SF. * config/i386/i386.md (*movsf_internal): Call ix86_output_ssemov for TYPE_SSEMOV. Remove TARGET_PREFER_AVX256, TARGET_AVX512VL and ext_sse_reg_operand check. gcc/testsuite/ PR target/89229 * gcc.target/i386/pr89229-6a.c: New test. * gcc.target/i386/pr89229-6b.c: Likewise. * gcc.target/i386/pr89229-6c.c: Likewise.
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>: https://gcc.gnu.org/g:5a3c42b227bbe9e7acb5335088d2255262311bd8 commit r10-7189-g5a3c42b227bbe9e7acb5335088d2255262311bd8 Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Mar 16 03:48:55 2020 -0700 i386: Use ix86_output_ssemov for SImode TYPE_SSEMOV There is no need to set mode attribute to XImode since ix86_output_ssemov can properly encode xmm16-xmm31 registers with and without AVX512VL. Remove ext_sse_reg_operand since it is no longer needed. gcc/ PR target/89229 * config/i386/i386.md (*movsi_internal): Call ix86_output_ssemov for TYPE_SSEMOV. Remove ext_sse_reg_operand and TARGET_AVX512VL check. * config/i386/predicates.md (ext_sse_reg_operand): Removed. gcc/testsuite/ PR target/89229 * gcc.target/i386/pr89229-7a.c: New test. * gcc.target/i386/pr89229-7b.c: Likewise. * gcc.target/i386/pr89229-7c.c: Likewise.
Fixed for GCC 10.