Bug 89229 - Incorrect xmm16-xmm31/ymm16-ymm31 in vector move
Summary: Incorrect xmm16-xmm31/ymm16-ymm31 in vector move
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 9.0
: P2 normal
Target Milestone: 10.0
Assignee: Not yet assigned to anyone
URL: https://gcc.gnu.org/ml/gcc-patches/20...
Keywords: patch, wrong-code
: 86896 89346 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-02-07 04:32 UTC by H.J. Lu
Modified: 2020-03-16 10:54 UTC (History)
5 users (show)

See Also:
Host:
Target: i386,x86-64
Build:
Known to work:
Known to fail:
Last reconfirmed: 2019-02-07 00:00:00


Attachments
I am testing this (1.94 KB, patch)
2019-02-12 23:11 UTC, H.J. Lu
Details | Diff
An updated patch (6.46 KB, patch)
2019-02-13 17:46 UTC, H.J. Lu
Details | Diff
A new patch (6.53 KB, patch)
2019-02-13 21:43 UTC, H.J. Lu
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2019-02-07 04:32:22 UTC
movoi_internal_avx and movti_internal have

   (set (attr "mode")
        (cond [(ior (match_operand 0 "ext_sse_reg_operand")
                    (match_operand 1 "ext_sse_reg_operand"))
                 (const_string "XI")
               (and (eq_attr "alternative" "1")
                    (match_test "TARGET_AVX512VL"))
                 (const_string "XI")
               (ior (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")
                    (and (eq_attr "alternative" "3")
                         (match_test "TARGET_SSE_TYPELESS_STORES")))
                 (const_string "V8SF")
              ]
              (const_string "OI")))])

But

              (and (eq_attr "alternative" "1")
                    (match_test "TARGET_AVX512VL"))
                 (const_string "XI")

is unnecessary.  As the result, we are generating

	vpternlogd	$0xFF, %zmm0, %zmm0, %zmm0

which is only needed for %xmm16 - %xmm31/%ymm16 - %ymm31, when

        vpcmpeqd	%ymm0, %ymm0, %ymm0

or

       vpcmpeqd	       %xmm0, %xmm0, %xmm0
       
are sufficient.
Comment 1 H.J. Lu 2019-02-07 04:38:23 UTC
sse.md has

(define_insn "mov<mode>_internal"
  [(set (match_operand:VMOVE 0 "nonimmediate_operand"
         "=v,v ,v ,m")
        (match_operand:VMOVE 1 "nonimmediate_or_sse_const_operand"
         " C,BC,vm,v"))]
  "TARGET_SSE
   && (register_operand (operands[0], <MODE>mode)
       || register_operand (operands[1], <MODE>mode))"
...
   (set (attr "mode")
        (cond [(and (eq_attr "alternative" "1")
                    (match_test "TARGET_AVX512VL"))
                 (const_string "<sseinsnmode>")
               (and (match_test "<MODE_SIZE> == 16")
                    (ior (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")
                         (and (eq_attr "alternative" "3") 
                              (match_test "TARGET_SSE_TYPELESS_STORES"))))
                 (const_string "<ssePSmode>")
               (match_test "TARGET_AVX")
                 (const_string "<sseinsnmode>")
               (ior (not (match_test "TARGET_SSE2"))
                    (match_test "optimize_function_for_size_p (cfun)"))
                 (const_string "V4SF")
               (and (eq_attr "alternative" "0")
                    (match_test "TARGET_SSE_LOAD0_BY_PXOR"))
                 (const_string "TI")
              ]
              (const_string "<sseinsnmode>")))

            (and (eq_attr "alternative" "1")
                    (match_test "TARGET_AVX512VL"))
                 (const_string "<sseinsnmode>")

is OK.
Comment 2 H.J. Lu 2019-02-07 15:47:20 UTC
Another problem:

       (cond [(ior (match_operand 0 "ext_sse_reg_operand")
                    (match_operand 1 "ext_sse_reg_operand"))
                 (const_string "XI") 

We shouldn't use XI for TARGET_AVX512VL.  OI/TI is OK for upper
16 vector registers with TARGET_AVX512VL.
Comment 3 hjl@gcc.gnu.org 2019-02-07 17:58:52 UTC
Author: hjl
Date: Thu Feb  7 17:58:19 2019
New Revision: 268657

URL: https://gcc.gnu.org/viewcvs?rev=268657&root=gcc&view=rev
Log:
i386: Fix typo in *movoi_internal_avx/movti_internal

	PR target/89229
	* config/i386/i386.md (*movoi_internal_avx): Set mode to OI
	for TARGET_AVX512VL.
	(*movti_internal): Set mode to TI for TARGET_AVX512VL.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.md
Comment 4 Richard Biener 2019-02-08 08:34:56 UTC
Is this fixed on trunk now?
Comment 5 hjl@gcc.gnu.org 2019-02-08 11:31:44 UTC
Author: hjl
Date: Fri Feb  8 11:30:53 2019
New Revision: 268678

URL: https://gcc.gnu.org/viewcvs?rev=268678&root=gcc&view=rev
Log:
i386: Use OI/TImode in *mov[ot]i_internal_avx with AVX512VL

OImode and TImode moves must be done in XImode to access upper 16
vector registers without AVX512VL.  With AVX512VL, we can access
upper 16 vector registers in OImode and TImode.

	PR target/89229
	* config/i386/i386.md (*movoi_internal_avx): Set mode to XI for
	upper 16 vector registers without TARGET_AVX512VL.
	(*movti_internal): Likewise.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.md
Comment 6 H.J. Lu 2019-02-08 11:43:34 UTC
(In reply to Richard Biener from comment #4)
> Is this fixed on trunk now?

Yes.
Comment 7 H.J. Lu 2019-02-11 12:51:14 UTC
[hjl@gnu-cfl-1 gcc]$ cat /export/gnu/import/git/gitlab/x86-gcc/gcc/testsuite/gcc.target/i386/pr89029-1.c
/* { dg-do assemble { target { avx512bw && avx512vl } } } */
/* { dg-options "-O1 -mavx512bw -mavx512vl -mtune=skylake-avx512" } */

extern void abort (void);
extern void exit (int);
struct s { unsigned char a[256]; };
union u { struct { struct s b; int c; } d; struct { int c; struct s b; } e; };
static union u v;
static union u v0;
static struct s *p = &v.d.b;
static struct s *q = &v.e.b;

static inline struct s rp (void) { return *p; }
static inline struct s rq (void) { return *q; }
static void pq (void) { *p = rq(); }
static void qp (void) { *q = rp(); }

static void
init (struct s *sp)
{
  int i;
  for (i = 0; i < 256; i++)
    sp->a[i] = i;
}

static void
check (struct s *sp)
{
  int i;
  for (i = 0; i < 256; i++)
    if (sp->a[i] != i)
      abort ();
}

void
main_test (void)
{
  v = v0;
  init (p);
  qp ();
  check (q);
  v = v0;
  init (q);
  pq ();
  check (p);
  exit (0);
}
[hjl@gnu-cfl-1 gcc]$ ./xgcc -B./ -c -O1 -mavx512bw -mavx512vl /export/gnu/import/git/gitlab/x86-gcc/gcc/testsuite/gcc.target/i386/pr89029-1.c -march=skylake-avx512
/tmp/ccqZUBNW.s: Assembler messages:
/tmp/ccqZUBNW.s:34: Error: unsupported instruction `vmovdqa'
/tmp/ccqZUBNW.s:35: Error: unsupported instruction `vmovdqa'
/tmp/ccqZUBNW.s:36: Error: unsupported instruction `vmovdqa'
[hjl@gnu-cfl-1 gcc]$
Comment 8 hjl@gcc.gnu.org 2019-02-12 19:01:07 UTC
Author: hjl
Date: Tue Feb 12 19:00:35 2019
New Revision: 268811

URL: https://gcc.gnu.org/viewcvs?rev=268811&root=gcc&view=rev
Log:
i386: Revert revision 268678 and revision 268657

i386 backend has

INT_MODE (OI, 32);
INT_MODE (XI, 64);

So, XI_MODE represents 64 INTEGER bytes = 64 * 8 = 512 bit operation,
in case of const_1, all 512 bits set.

We can load zeros with narrower instruction, (e.g. 256 bit by inherent
zeroing of highpart in case of 128 bit xor), so TImode in this case.

Some targets prefer V4SF mode, so they will emit float xorps for zeroing

Then the introduction of AVX512F fubared everything by overloading the
meaning of insn mode.

How should we use INSN mode,  MODE_XI, in standard_sse_constant_opcode
and patterns which use standard_sse_constant_opcode? 2 options:

1.  MODE_XI should only used to check if EXT_REX_SSE_REG_P is true
in any register operand.  The operand size must be determined by operand
itself , not by MODE_XI.  The operand encoding size should be determined
by the operand size, EXT_REX_SSE_REG_P and AVX512VL.
2. MODE_XI should be used to determine the operand encoding size.
EXT_REX_SSE_REG_P and AVX512VL should be checked for encoding
instructions.

gcc/

	PR target/89229
	* config/i386/i386.md (*movoi_internal_avx): Revert revision
	268678 and revision 268657.
	(*movti_internal): Likewise.

gcc/testsuite/

	PR target/89229
	* gcc.target/i386/pr89229-1.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr89229-1.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.md
    trunk/gcc/testsuite/ChangeLog
Comment 9 H.J. Lu 2019-02-12 21:20:22 UTC
[hjl@gnu-4 i386]$ cat pr89229-2.c
/* { dg-do compile } */
/* { dg-options "-O2 -march=skylake-avx512" } */

typedef __int128 __m128t __attribute__ ((__vector_size__ (16), __may_alias__));

__m128t
foo (void)
{
  register __int128 xmm16 __asm ("xmm16") = (__int128) -1;
  asm volatile ("" : "+v" (xmm16));
  return (__m128t) xmm16;
}

/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
[hjl@gnu-4 i386]$ gcc -O2 -march=skylake-avx512 -S pr89229-2.c -o /tmp/x.s
[hjl@gnu-4 i386]$ cat /tmp/x.s
	.file	"pr89229-2.c"
	.text
	.p2align 4,,15
	.globl	foo
	.type	foo, @function
foo:
.LFB0:
	.cfi_startproc
	vpternlogd	$0xFF, %zmm16, %zmm16, %zmm16  <<<<<<< Should be xmm16
	vmovdqa64	%xmm16, %xmm0
	ret
	.cfi_endproc
.LFE0:
	.size	foo, .-foo
	.ident	"GCC: (GNU) 8.2.1 20190209 (Red Hat 8.2.1-8)"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-4 i386]$
Comment 10 Jakub Jelinek 2019-02-12 21:27:14 UTC
Though, is this really a regression? I mean, have we ever emitted better code?
Comment 11 H.J. Lu 2019-02-12 21:42:38 UTC
(In reply to Jakub Jelinek from comment #10)
> Though, is this really a regression? I mean, have we ever emitted better
> code?

It isn't a regression.
Comment 12 H.J. Lu 2019-02-12 22:45:19 UTC
[hjl@gnu-4 tmp]$ cat x.c
/* { dg-do compile } */
/* { dg-options "-O2 -march=skylake-avx512" } */

extern int i;

int
foo1 (void)
{
  register int xmm16 __asm ("xmm16") = i;
  asm volatile ("" : "+v" (xmm16));
  register int xmm17 __asm ("xmm17") = xmm16;
  asm volatile ("" : "+v" (xmm17));
  return xmm17;
}

int
foo2 (void)
{
  register int xmm1 __asm ("xmm1") = i;
  asm volatile ("" : "+v" (xmm1));
  register int xmm17 __asm ("xmm17") = xmm1;
  asm volatile ("" : "+v" (xmm17));
  return xmm1;
}

/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
[hjl@gnu-4 tmp]$  gcc -S -O2 -march=skylake-avx512 x.c
[hjl@gnu-4 tmp]$ cat x.s
	.file	"x.c"
	.text
	.p2align 4,,15
	.globl	foo1
	.type	foo1, @function
foo1:
.LFB0:
	.cfi_startproc
	vmovd	i(%rip), %xmm16
	vmovdqa32	%zmm16, %zmm17
	vmovd	%xmm17, %eax
	ret
	.cfi_endproc
.LFE0:
	.size	foo1, .-foo1
	.p2align 4,,15
	.globl	foo2
	.type	foo2, @function
foo2:
.LFB1:
	.cfi_startproc
	vmovd	i(%rip), %xmm1
	vmovdqa32	%zmm1, %zmm17
	vmovd	%xmm1, %eax
	ret
	.cfi_endproc
.LFE1:
	.size	foo2, .-foo2
	.ident	"GCC: (GNU) 8.2.1 20190209 (Red Hat 8.2.1-8)"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-4 tmp]$
Comment 13 H.J. Lu 2019-02-12 23:11:52 UTC
Created attachment 45685 [details]
I am testing this
Comment 14 Jakub Jelinek 2019-02-12 23:18:34 UTC
Comment on attachment 45685 [details]
I am testing this

The movsi change doesn't look entirely right to me.  While OImode or TImode is not allowed in ext sse regs unless AVX512VL, that is not the case for SImode, so for SImode if one or both operands are ext sse regs and !TARGET_AVX512VL, we need to use MODE_XI and use the pattern with %g1, %g0 in there.
Comment 15 H.J. Lu 2019-02-13 00:19:04 UTC
[hjl@gnu-4 gcc]$ cat /tmp/x.c 
/* { dg-do compile { target { ! ia32 } } } */
/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */

extern double d;

void
foo1 (double x)
{
  register double xmm16 __asm ("xmm16") = x;
  asm volatile ("" : "+v" (xmm16));
  d = xmm16;
}

/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
[hjl@gnu-4 gcc]$  gcc -S -O2 -march=skylake-avx512 /tmp/x.c -mprefer-vector-width=512
[hjl@gnu-4 gcc]$ cat x.s
	.file	"x.c"
	.text
	.p2align 4,,15
	.globl	foo1
	.type	foo1, @function
foo1:
.LFB0:
	.cfi_startproc
	vmovapd	%zmm0, %zmm16
	vmovsd	%xmm16, d(%rip)
	ret
	.cfi_endproc
.LFE0:
	.size	foo1, .-foo1
	.ident	"GCC: (GNU) 8.2.1 20190209 (Red Hat 8.2.1-8)"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-4 gcc]$
Comment 16 H.J. Lu 2019-02-13 00:45:42 UTC
[hjl@gnu-4 gcc]$ cat /tmp/y.c
/* { dg-do compile { target { ! ia32 } } } */
/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */

extern float d;

void
foo1 (float x)
{
  register float xmm16 __asm ("xmm16") = x;
  asm volatile ("" : "+v" (xmm16));
  d = xmm16;
}

/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
[hjl@gnu-4 gcc]$ gcc -S -O2 -march=skylake-avx512 /tmp/y.c -mprefer-vector-width=512
[hjl@gnu-4 gcc]$ cat y.s
	.file	"y.c"
	.text
	.p2align 4,,15
	.globl	foo1
	.type	foo1, @function
foo1:
.LFB0:
	.cfi_startproc
	vmovaps	%zmm0, %zmm16
	vmovss	%xmm16, d(%rip)
	ret
	.cfi_endproc
.LFE0:
	.size	foo1, .-foo1
	.ident	"GCC: (GNU) 8.2.1 20190209 (Red Hat 8.2.1-8)"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-4 gcc]$
Comment 17 H.J. Lu 2019-02-13 00:49:54 UTC
[hjl@gnu-4 gcc]$ cat /tmp/z.c
/* { dg-do compile { target { ! ia32 } } } */
/* { dg-options "-O2 -march=skylake-avx512" } */

extern long long i;

long long
foo1 (void)
{
  register long long xmm16 __asm ("xmm16") = i;
  asm volatile ("" : "+v" (xmm16));
  register long long xmm17 __asm ("xmm17") = xmm16;
  asm volatile ("" : "+v" (xmm17));
  return xmm17;
}

/* { dg-final { scan-assembler-times "vmovdqa32\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */
/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
[hjl@gnu-4 gcc]$ gcc -S -O2 -march=skylake-avx512 /tmp/z.c   -mno-avx512vl
[hjl@gnu-4 gcc]$ cat z.s
	.file	"z.c"
	.text
	.p2align 4,,15
	.globl	foo1
	.type	foo1, @function
foo1:
.LFB0:
	.cfi_startproc
	vmovq	i(%rip), %xmm16
	vmovdqa64	%xmm16, %xmm17  <<< This is an AVX512VL instruction.
	vmovq	%xmm17, %rax
	ret
	.cfi_endproc
.LFE0:
	.size	foo1, .-foo1
	.ident	"GCC: (GNU) 8.2.1 20190209 (Red Hat 8.2.1-8)"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-4 gcc]$
Comment 18 H.J. Lu 2019-02-13 00:50:58 UTC
(In reply to Jakub Jelinek from comment #14)
> Comment on attachment 45685 [details]
> I am testing this
> 
> The movsi change doesn't look entirely right to me.  While OImode or TImode
> is not allowed in ext sse regs unless AVX512VL, that is not the case for
> SImode, so for SImode if one or both operands are ext sse regs and
> !TARGET_AVX512VL, we need to use MODE_XI and use the pattern with %g1, %g0
> in there.

No need to set MODE_XI:

        if (EXT_REX_SSE_REG_P (operands[0])
              || EXT_REX_SSE_REG_P (operands[1]))
            {
              if (TARGET_AVX512VL)
                return "vmovdqa32\t{%1, %0|%0, %1}"; 
              else
                return "vmovdqa32\t{%g1, %0|%0, %g1}";
            }
          else
            return "%vmovdqa\t{%1, %0|%0, %1}";
Comment 19 H.J. Lu 2019-02-13 16:26:40 UTC
sse.md has

(define_insn "mov<mode>_internal"
  [(set (match_operand:VMOVE 0 "nonimmediate_operand"
         "=v,v ,v ,m")
        (match_operand:VMOVE 1 "nonimmediate_or_sse_const_operand"
         " C,BC,vm,v"))]
....
      /* There is no evex-encoded vmov* for sizes smaller than 64-bytes
         in avx512f, so we need to use workarounds, to access sse registers
         16-31, which are evex-only. In avx512vl we don't need workarounds.  */
      if (TARGET_AVX512F && <MODE_SIZE> < 64 && !TARGET_AVX512VL
          && (EXT_REX_SSE_REG_P (operands[0])
              || EXT_REX_SSE_REG_P (operands[1])))
        {
          if (memory_operand (operands[0], <MODE>mode))
            {
              if (<MODE_SIZE> == 32)
                return "vextract<shuffletype>64x4\t{$0x0, %g1, %0|%0, %g1, 0x0}";
              else if (<MODE_SIZE> == 16)
                return "vextract<shuffletype>32x4\t{$0x0, %g1, %0|%0, %g1, 0x0}";
              else
                gcc_unreachable ();
            }
...

However, ix86_hard_regno_mode_ok has

     /* TODO check for QI/HI scalars.  */
      /* AVX512VL allows sse regs16+ for 128/256 bit modes.  */
      if (TARGET_AVX512VL
          && (mode == OImode
              || mode == TImode
              || VALID_AVX256_REG_MODE (mode)
              || VALID_AVX512VL_128_REG_MODE (mode)))
        return true;

      /* xmm16-xmm31 are only available for AVX-512.  */
      if (EXT_REX_SSE_REGNO_P (regno))
        return false;

      if (TARGET_AVX512F && <MODE_SIZE> < 64 && !TARGET_AVX512VL
          && (EXT_REX_SSE_REG_P (operands[0])
              || EXT_REX_SSE_REG_P (operands[1])))

is a dead code:

[hjl@gnu-4 gcc]$ cat /tmp/z.c 
#include <immintrin.h>

extern __m128 i;

__m128
foo1 (void)
{
  register __m128 xmm16 __asm ("xmm16") = i;
  asm volatile ("" : "+v" (xmm16));
  register __m128 xmm17 __asm ("xmm17") = xmm16;
  asm volatile ("" : "+v" (xmm17));
  return xmm17;
}
[hjl@gnu-4 gcc]$ /usr/gcc-5.4.1-x32/bin/gcc  -S -O2 -march=knl /tmp/z.c 
/tmp/z.c: In function ‘foo1’:
/tmp/z.c:8:19: error: register specified for ‘xmm16’ isn’t suitable for data type
   register __m128 xmm16 __asm ("xmm16") = i;
                   ^
/tmp/z.c:10:19: error: register specified for ‘xmm17’ isn’t suitable for data type
   register __m128 xmm17 __asm ("xmm17") = xmm16;
                   ^
[hjl@gnu-4 gcc]$
Comment 20 H.J. Lu 2019-02-13 17:46:48 UTC
Created attachment 45705 [details]
An updated patch
Comment 21 H.J. Lu 2019-02-13 21:43:30 UTC
Created attachment 45707 [details]
A new patch
Comment 22 H.J. Lu 2019-02-22 16:19:08 UTC
*** Bug 86896 has been marked as a duplicate of this bug. ***
Comment 23 H.J. Lu 2019-02-22 16:28:23 UTC
A patch is posted at

https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01841.html
Comment 24 H.J. Lu 2019-02-22 16:29:01 UTC
Comment on attachment 45707 [details]
A new patch

>From fd7220a7551ee774614ca89574241813aae153b7 Mon Sep 17 00:00:00 2001
>From: "H.J. Lu" <hjl.tools@gmail.com>
>Date: Tue, 12 Feb 2019 13:25:41 -0800
>Subject: [PATCH] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move
>
>i386 backend has
>
>INT_MODE (OI, 32);
>INT_MODE (XI, 64);
>
>So, XI_MODE represents 64 INTEGER bytes = 64 * 8 = 512 bit operation,
>in case of const_1, all 512 bits set.
>
>We can load zeros with narrower instruction, (e.g. 256 bit by inherent
>zeroing of highpart in case of 128 bit xor), so TImode in this case.
>
>Some targets prefer V4SF mode, so they will emit float xorps for zeroing.
>
>sse.md has
>
>(define_insn "mov<mode>_internal"
>  [(set (match_operand:VMOVE 0 "nonimmediate_operand"
>         "=v,v ,v ,m")
>        (match_operand:VMOVE 1 "nonimmediate_or_sse_const_operand"
>         " C,BC,vm,v"))]
>....
>      /* There is no evex-encoded vmov* for sizes smaller than 64-bytes
>         in avx512f, so we need to use workarounds, to access sse registers
>         16-31, which are evex-only. In avx512vl we don't need workarounds.  */
>      if (TARGET_AVX512F && <MODE_SIZE> < 64 && !TARGET_AVX512VL
>          && (EXT_REX_SSE_REG_P (operands[0])
>              || EXT_REX_SSE_REG_P (operands[1])))
>        {
>          if (memory_operand (operands[0], <MODE>mode))
>            {
>              if (<MODE_SIZE> == 32)
>                return "vextract<shuffletype>64x4\t{$0x0, %g1, %0|%0, %g1, 0x0}";
>              else if (<MODE_SIZE> == 16)
>                return "vextract<shuffletype>32x4\t{$0x0, %g1, %0|%0, %g1, 0x0}";
>              else
>                gcc_unreachable ();
>            }
>...
>
>However, since ix86_hard_regno_mode_ok has
>
>     /* TODO check for QI/HI scalars.  */
>      /* AVX512VL allows sse regs16+ for 128/256 bit modes.  */
>      if (TARGET_AVX512VL
>          && (mode == OImode
>              || mode == TImode
>              || VALID_AVX256_REG_MODE (mode)
>              || VALID_AVX512VL_128_REG_MODE (mode)))
>        return true;
>
>      /* xmm16-xmm31 are only available for AVX-512.  */
>      if (EXT_REX_SSE_REGNO_P (regno))
>        return false;
>
>      if (TARGET_AVX512F && <MODE_SIZE> < 64 && !TARGET_AVX512VL
>          && (EXT_REX_SSE_REG_P (operands[0])
>              || EXT_REX_SSE_REG_P (operands[1])))
>
>is a dead code.
>
>All TYPE_SSEMOV vector moves are consolidated to ix86_output_ssemov:
>
>1. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE/AVX vector
>moves will be generated.
>2. If xmm16-xmm31/ymm16-ymm31 registers are used:
>   a. With AVX512VL, AVX512VL vector moves will be generated.
>   b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
>      move will be done with zmm register move.
>
>ext_sse_reg_operand is removed since it is no longer needed.
>
>gcc/
>
>	PR target/89229
>	* config/i386/i386-protos.h (ix86_output_ssemov): New prototype.
>	* config/i386/i386.c (ix86_get_ssemov): New function.
>	(ix86_output_ssemov): Likewise.
>	* config/i386/i386.md (*movxi_internal_avx512f): Call
>	ix86_output_ssemov for TYPE_SSEMOV.
>	(*movoi_internal_avx): Call ix86_output_ssemov for TYPE_SSEMOV.
>	Remove ext_sse_reg_operand and TARGET_AVX512VL check.
>	(*movti_internal): Likewise.
>	(*movdi_internal): Call ix86_output_ssemov for TYPE_SSEMOV.
>	Remove ext_sse_reg_operand check.
>	(*movsi_internal): Likewise.
>	(*movtf_internal): Call ix86_output_ssemov for TYPE_SSEMOV.
>	(*movdf_internal): Call ix86_output_ssemov for TYPE_SSEMOV.
>	Remove TARGET_AVX512F, TARGET_PREFER_AVX256, TARGET_AVX512VL
>	and ext_sse_reg_operand check.
>	(*movsf_internal_avx): Call ix86_output_ssemov for TYPE_SSEMOV.
>	Remove TARGET_PREFER_AVX256, TARGET_AVX512VL and
>	ext_sse_reg_operand check.
>	* config/i386/mmx.md (MMXMODE:*mov<mode>_internal): Call
>	ix86_output_ssemov for TYPE_SSEMOV.  Remove ext_sse_reg_operand
>	check.
>	* config/i386/sse.md (VMOVE:mov<mode>_internal): Call
>	ix86_output_ssemov for TYPE_SSEMOV.  Remove TARGET_AVX512VL
>	check.
>	* config/i386/predicates.md (ext_sse_reg_operand): Removed.
>
>gcc/testsuite/
>
>	PR target/89229
>	* gcc.target/i386/pr89229-2a.c: New test.
>	* gcc.target/i386/pr89229-2b.c: Likewise.
>	* gcc.target/i386/pr89229-2c.c: Likewise.
>	* gcc.target/i386/pr89229-3a.c: Likewise.
>	* gcc.target/i386/pr89229-3b.c: Likewise.
>	* gcc.target/i386/pr89229-3c.c: Likewise.
>	* gcc.target/i386/pr89229-4a.c: Likewise.
>	* gcc.target/i386/pr89229-4b.c: Likewise.
>	* gcc.target/i386/pr89229-4c.c: Likewise.
>	* gcc.target/i386/pr89229-5a.c: Likewise.
>	* gcc.target/i386/pr89229-5b.c: Likewise.
>	* gcc.target/i386/pr89229-5c.c: Likewise.
>	* gcc.target/i386/pr89229-6a.c: Likewise.
>	* gcc.target/i386/pr89229-6b.c: Likewise.
>	* gcc.target/i386/pr89229-6c.c: Likewise.
>	* gcc.target/i386/pr89229-7a.c: Likewise.
>	* gcc.target/i386/pr89229-7b.c: Likewise.
>	* gcc.target/i386/pr89229-7c.c: Likewise.
>---
> gcc/config/i386/i386-protos.h              |   2 +
> gcc/config/i386/i386.c                     | 250 +++++++++++++++++++++
> gcc/config/i386/i386.md                    | 212 ++---------------
> gcc/config/i386/mmx.md                     |  29 +--
> gcc/config/i386/predicates.md              |   5 -
> gcc/config/i386/sse.md                     |  98 +-------
> gcc/testsuite/gcc.target/i386/pr89229-2a.c |  15 ++
> gcc/testsuite/gcc.target/i386/pr89229-2b.c |  13 ++
> gcc/testsuite/gcc.target/i386/pr89229-2c.c |   6 +
> gcc/testsuite/gcc.target/i386/pr89229-3a.c |  17 ++
> gcc/testsuite/gcc.target/i386/pr89229-3b.c |   6 +
> gcc/testsuite/gcc.target/i386/pr89229-3c.c |   7 +
> gcc/testsuite/gcc.target/i386/pr89229-4a.c |  17 ++
> gcc/testsuite/gcc.target/i386/pr89229-4b.c |   6 +
> gcc/testsuite/gcc.target/i386/pr89229-4c.c |   7 +
> gcc/testsuite/gcc.target/i386/pr89229-5a.c |  16 ++
> gcc/testsuite/gcc.target/i386/pr89229-5b.c |   6 +
> gcc/testsuite/gcc.target/i386/pr89229-5c.c |   6 +
> gcc/testsuite/gcc.target/i386/pr89229-6a.c |  16 ++
> gcc/testsuite/gcc.target/i386/pr89229-6b.c |   6 +
> gcc/testsuite/gcc.target/i386/pr89229-6c.c |   6 +
> gcc/testsuite/gcc.target/i386/pr89229-7a.c |  16 ++
> gcc/testsuite/gcc.target/i386/pr89229-7b.c |  12 +
> gcc/testsuite/gcc.target/i386/pr89229-7c.c |   6 +
> 24 files changed, 453 insertions(+), 327 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-2a.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-2b.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-2c.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-3a.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-3b.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-3c.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-4a.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-4b.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-4c.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-5a.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-5b.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-5c.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-6a.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-6b.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-6c.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-7a.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-7b.c
> create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-7c.c
>
>diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
>index 2d600173917..27f5cc13abf 100644
>--- a/gcc/config/i386/i386-protos.h
>+++ b/gcc/config/i386/i386-protos.h
>@@ -38,6 +38,8 @@ extern void ix86_expand_split_stack_prologue (void);
> extern void ix86_output_addr_vec_elt (FILE *, int);
> extern void ix86_output_addr_diff_elt (FILE *, int, int);
> 
>+extern const char *ix86_output_ssemov (rtx_insn *, rtx *);
>+
> extern enum calling_abi ix86_cfun_abi (void);
> extern enum calling_abi ix86_function_type_abi (const_tree);
> 
>diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>index fd05873ba39..97d1ea4229e 100644
>--- a/gcc/config/i386/i386.c
>+++ b/gcc/config/i386/i386.c
>@@ -10281,6 +10281,256 @@ ix86_standard_x87sse_constant_load_p (const rtx_insn *insn, rtx dst)
>   return true;
> }
> 
>+/* Return the opcode of the TYPE_SSEMOV instruction.  To move from
>+   or to xmm16-xmm31/ymm16-ymm31 registers, we either require
>+   TARGET_AVX512VL or it is a register to register move which can
>+   be done with zmm register move. */
>+
>+static const char *
>+ix86_get_ssemov (rtx *operands, unsigned size, machine_mode mode)
>+{
>+  static char buf[128];
>+  bool misaligned_p = (misaligned_operand (operands[0], mode)
>+		       || misaligned_operand (operands[1], mode));
>+  bool evex_reg_p = (EXT_REX_SSE_REG_P (operands[0])
>+		     || EXT_REX_SSE_REG_P (operands[1]));
>+  machine_mode scalar_mode = GET_MODE_INNER (mode);
>+
>+  const char *opcode = NULL;
>+  enum
>+    {
>+      opcode_int,
>+      opcode_float,
>+      opcode_double
>+    } type = opcode_int;
>+  if (SCALAR_FLOAT_MODE_P (scalar_mode))
>+    {
>+      switch (scalar_mode)
>+	{
>+	case E_SFmode:
>+	  if (size == 64 || !evex_reg_p || TARGET_AVX512VL)
>+	    opcode = misaligned_p ? "%vmovups" : "%vmovaps";
>+	  else
>+	    type = opcode_float;
>+	  break;
>+	case E_DFmode:
>+	  if (size == 64 || !evex_reg_p || TARGET_AVX512VL)
>+	    opcode = misaligned_p ? "%vmovupd" : "%vmovapd";
>+	  else
>+	    type = opcode_double;
>+	  break;
>+	case E_TFmode:
>+	  if (size == 64)
>+	    opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
>+	  else if (evex_reg_p)
>+	    {
>+	      if (TARGET_AVX512VL)
>+		opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
>+	    }
>+	  else
>+	    opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
>+	  break;
>+	default:
>+	  gcc_unreachable ();
>+	}
>+    }
>+  else if (SCALAR_INT_MODE_P (scalar_mode))
>+    {
>+      switch (scalar_mode)
>+	{
>+	case E_QImode:
>+	  if (size == 64)
>+	    opcode = (misaligned_p
>+		      ? (TARGET_AVX512BW
>+			 ? "vmovdqu8"
>+			 : "vmovdqu64")
>+		      : "vmovdqa64");
>+	  else if (evex_reg_p)
>+	    {
>+	      if (TARGET_AVX512VL)
>+		opcode = (misaligned_p
>+			  ? (TARGET_AVX512BW
>+			     ? "vmovdqu8"
>+			     : "vmovdqu64")
>+			  : "vmovdqa64");
>+	    }
>+	  else
>+	    opcode = (misaligned_p
>+		      ? (TARGET_AVX512BW
>+			 ? "vmovdqu8"
>+			 : "%vmovdqu")
>+		      : "%vmovdqa");
>+	  break;
>+	case E_HImode:
>+	  if (size == 64)
>+	    opcode = (misaligned_p
>+		      ? (TARGET_AVX512BW
>+			 ? "vmovdqu16"
>+			 : "vmovdqu64")
>+		      : "vmovdqa64");
>+	  else if (evex_reg_p)
>+	    {
>+	      if (TARGET_AVX512VL)
>+		opcode = (misaligned_p
>+			  ? (TARGET_AVX512BW
>+			     ? "vmovdqu16"
>+			     : "vmovdqu64")
>+			  : "vmovdqa64");
>+	    }
>+	  else
>+	    opcode = (misaligned_p
>+		      ? (TARGET_AVX512BW
>+			 ? "vmovdqu16"
>+			 : "%vmovdqu")
>+		      : "%vmovdqa");
>+	  break;
>+	case E_SImode:
>+	  if (size == 64)
>+	    opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
>+	  else if (evex_reg_p)
>+	    {
>+	      if (TARGET_AVX512VL)
>+		opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
>+	    }
>+	  else
>+	    opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
>+	  break;
>+	case E_DImode:
>+	case E_TImode:
>+	case E_OImode:
>+	  if (size == 64)
>+	    opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
>+	  else if (evex_reg_p)
>+	    {
>+	      if (TARGET_AVX512VL)
>+		opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
>+	    }
>+	  else
>+	    opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
>+	  break;
>+	case E_XImode:
>+	  opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
>+	  break;
>+	default:
>+	  gcc_unreachable ();
>+	}
>+    }
>+  else
>+    gcc_unreachable ();
>+
>+  if (!opcode)
>+    {
>+      /* NB: We get here only because we move xmm16-xmm31/ymm16-ymm31
>+         registers without AVX512VL by using zmm register move.  */
>+      if (!evex_reg_p
>+	  || TARGET_AVX512VL
>+	  || memory_operand (operands[0], mode)
>+	  || memory_operand (operands[1], mode))
>+	gcc_unreachable ();
>+      size = 64;
>+      switch (type)
>+	{
>+	case opcode_int:
>+	  opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
>+	  break;
>+	case opcode_float:
>+	  opcode = misaligned_p ? "%vmovups" : "%vmovaps";
>+	  break;
>+	case opcode_double:
>+	  opcode = misaligned_p ? "%vmovupd" : "%vmovapd";
>+	  break;
>+	}
>+    }
>+
>+  switch (size)
>+    {
>+    case 64:
>+      snprintf (buf, sizeof (buf), "%s\t{%%g1, %%g0|%%g0, %%g1}",
>+		opcode);
>+      break;
>+    case 32:
>+      snprintf (buf, sizeof (buf), "%s\t{%%t1, %%t0|%%t0, %%t1}",
>+		opcode);
>+      break;
>+    case 16:
>+      snprintf (buf, sizeof (buf), "%s\t{%%x1, %%x0|%%x0, %%x1}",
>+		opcode);
>+      break;
>+    default:
>+      gcc_unreachable ();
>+    }
>+  return buf;
>+}
>+
>+/* Return the template of the TYPE_SSEMOV instruction to move
>+   operands[1] into operands[0].  */
>+
>+const char *
>+ix86_output_ssemov (rtx_insn *insn, rtx *operands)
>+{
>+  machine_mode mode = GET_MODE (operands[0]);
>+  if (get_attr_type (insn) != TYPE_SSEMOV
>+      || mode != GET_MODE (operands[1]))
>+    gcc_unreachable ();
>+
>+  enum attr_mode insn_mode = get_attr_mode (insn);
>+
>+  switch (insn_mode)
>+    {
>+    case MODE_XI:
>+    case MODE_V8DF:
>+    case MODE_V16SF:
>+      return ix86_get_ssemov (operands, 64, mode);
>+
>+    case MODE_OI:
>+    case MODE_V4DF:
>+    case MODE_V8SF:
>+      return ix86_get_ssemov (operands, 32, mode);
>+
>+    case MODE_TI:
>+    case MODE_V2DF:
>+    case MODE_V4SF:
>+      return ix86_get_ssemov (operands, 16, mode);
>+
>+    case MODE_DI:
>+      /* Handle broken assemblers that require movd instead of movq. */
>+      if (!HAVE_AS_IX86_INTERUNIT_MOVQ
>+	  && (GENERAL_REG_P (operands[0])
>+	      || GENERAL_REG_P (operands[1])))
>+	return "%vmovd\t{%1, %0|%0, %1}";
>+      else
>+	return "%vmovq\t{%1, %0|%0, %1}";
>+
>+    case MODE_V2SF:
>+      if (TARGET_AVX && REG_P (operands[0]))
>+	return "vmovlps\t{%1, %d0|%d0, %1}";
>+      else
>+	return "%vmovlps\t{%1, %0|%0, %1}";
>+
>+    case MODE_DF:
>+      if (TARGET_AVX && REG_P (operands[0]) && REG_P (operands[1]))
>+	return "vmovsd\t{%d1, %0|%0, %d1}";
>+      else
>+	return "%vmovsd\t{%1, %0|%0, %1}";
>+
>+    case MODE_V1DF:
>+      gcc_assert (!TARGET_AVX);
>+       return "movlpd\t{%1, %0|%0, %1}";
>+
>+    case MODE_SI:
>+      return "%vmovd\t{%1, %0|%0, %1}";
>+
>+    case MODE_SF:
>+      if (TARGET_AVX && REG_P (operands[0]) && REG_P (operands[1]))
>+	return "vmovss\t{%d1, %0|%0, %d1}";
>+      else
>+	return "%vmovss\t{%1, %0|%0, %1}";
>+
>+    default:
>+      gcc_unreachable ();
>+    }
>+}
>+
> /* Returns true if OP contains a symbol reference */
> 
> bool
>diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
>index 9948f77fca5..40ed93dc804 100644
>--- a/gcc/config/i386/i386.md
>+++ b/gcc/config/i386/i386.md
>@@ -1878,11 +1878,7 @@
>       return standard_sse_constant_opcode (insn, operands);
> 
>     case TYPE_SSEMOV:
>-      if (misaligned_operand (operands[0], XImode)
>-	  || misaligned_operand (operands[1], XImode))
>-	return "vmovdqu32\t{%1, %0|%0, %1}";
>-      else
>-	return "vmovdqa32\t{%1, %0|%0, %1}";
>+      return ix86_output_ssemov (insn, operands);
> 
>     default:
>       gcc_unreachable ();
>@@ -1905,25 +1901,7 @@
>       return standard_sse_constant_opcode (insn, operands);
> 
>     case TYPE_SSEMOV:
>-      if (misaligned_operand (operands[0], OImode)
>-	  || misaligned_operand (operands[1], OImode))
>-	{
>-	  if (get_attr_mode (insn) == MODE_V8SF)
>-	    return "vmovups\t{%1, %0|%0, %1}";
>-	  else if (get_attr_mode (insn) == MODE_XI)
>-	    return "vmovdqu32\t{%1, %0|%0, %1}";
>-	  else
>-	    return "vmovdqu\t{%1, %0|%0, %1}";
>-	}
>-      else
>-	{
>-	  if (get_attr_mode (insn) == MODE_V8SF)
>-	    return "vmovaps\t{%1, %0|%0, %1}";
>-	  else if (get_attr_mode (insn) == MODE_XI)
>-	    return "vmovdqa32\t{%1, %0|%0, %1}";
>-	  else
>-	    return "vmovdqa\t{%1, %0|%0, %1}";
>-	}
>+      return ix86_output_ssemov (insn, operands);
> 
>     default:
>       gcc_unreachable ();
>@@ -1933,13 +1911,7 @@
>    (set_attr "type" "sselog1,sselog1,ssemov,ssemov")
>    (set_attr "prefix" "vex")
>    (set (attr "mode")
>-	(cond [(ior (match_operand 0 "ext_sse_reg_operand")
>-		    (match_operand 1 "ext_sse_reg_operand"))
>-		 (const_string "XI")
>-	       (and (eq_attr "alternative" "1")
>-		    (match_test "TARGET_AVX512VL"))
>-		 (const_string "XI")
>-	       (ior (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")
>+	(cond [(ior (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")
> 		    (and (eq_attr "alternative" "3")
> 			 (match_test "TARGET_SSE_TYPELESS_STORES")))
> 		 (const_string "V8SF")
>@@ -1965,27 +1937,7 @@
>       return standard_sse_constant_opcode (insn, operands);
> 
>     case TYPE_SSEMOV:
>-      /* TDmode values are passed as TImode on the stack.  Moving them
>-	 to stack may result in unaligned memory access.  */
>-      if (misaligned_operand (operands[0], TImode)
>-	  || misaligned_operand (operands[1], TImode))
>-	{
>-	  if (get_attr_mode (insn) == MODE_V4SF)
>-	    return "%vmovups\t{%1, %0|%0, %1}";
>-	  else if (get_attr_mode (insn) == MODE_XI)
>-	    return "vmovdqu32\t{%1, %0|%0, %1}";
>-	  else
>-	    return "%vmovdqu\t{%1, %0|%0, %1}";
>-	}
>-      else
>-	{
>-	  if (get_attr_mode (insn) == MODE_V4SF)
>-	    return "%vmovaps\t{%1, %0|%0, %1}";
>-	  else if (get_attr_mode (insn) == MODE_XI)
>-	    return "vmovdqa32\t{%1, %0|%0, %1}";
>-	  else
>-	    return "%vmovdqa\t{%1, %0|%0, %1}";
>-	}
>+      return ix86_output_ssemov (insn, operands);
> 
>     default:
>       gcc_unreachable ();
>@@ -2012,12 +1964,6 @@
>    (set (attr "mode")
> 	(cond [(eq_attr "alternative" "0,1")
> 		 (const_string "DI")
>-	       (ior (match_operand 0 "ext_sse_reg_operand")
>-		    (match_operand 1 "ext_sse_reg_operand"))
>-		 (const_string "XI")
>-	       (and (eq_attr "alternative" "3")
>-		    (match_test "TARGET_AVX512VL"))
>-		 (const_string "XI")
> 	       (ior (not (match_test "TARGET_SSE2"))
> 		    (ior (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")
> 			 (and (eq_attr "alternative" "5")
>@@ -2091,31 +2037,7 @@
>       return standard_sse_constant_opcode (insn, operands);
> 
>     case TYPE_SSEMOV:
>-      switch (get_attr_mode (insn))
>-	{
>-	case MODE_DI:
>-	  /* Handle broken assemblers that require movd instead of movq.  */
>-	  if (!HAVE_AS_IX86_INTERUNIT_MOVQ
>-	      && (GENERAL_REG_P (operands[0]) || GENERAL_REG_P (operands[1])))
>-	    return "%vmovd\t{%1, %0|%0, %1}";
>-	  return "%vmovq\t{%1, %0|%0, %1}";
>-
>-	case MODE_TI:
>-	  /* Handle AVX512 registers set.  */
>-	  if (EXT_REX_SSE_REG_P (operands[0])
>-	      || EXT_REX_SSE_REG_P (operands[1]))
>-	    return "vmovdqa64\t{%1, %0|%0, %1}";
>-	  return "%vmovdqa\t{%1, %0|%0, %1}";
>-
>-	case MODE_V2SF:
>-	  gcc_assert (!TARGET_AVX);
>-	  return "movlps\t{%1, %0|%0, %1}";
>-	case MODE_V4SF:
>-	  return "%vmovaps\t{%1, %0|%0, %1}";
>-
>-	default:
>-	  gcc_unreachable ();
>-	}
>+      return ix86_output_ssemov (insn, operands);
> 
>     case TYPE_SSECVT:
>       if (SSE_REG_P (operands[0]))
>@@ -2201,10 +2123,7 @@
>      (cond [(eq_attr "alternative" "2")
> 	      (const_string "SI")
> 	    (eq_attr "alternative" "12,13")
>-	      (cond [(ior (match_operand 0 "ext_sse_reg_operand")
>-			  (match_operand 1 "ext_sse_reg_operand"))
>-		       (const_string "TI")
>-		     (ior (not (match_test "TARGET_SSE2"))
>+	      (cond [(ior (not (match_test "TARGET_SSE2"))
> 			  (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL"))
> 		       (const_string "V4SF")
> 		     (match_test "TARGET_AVX")
>@@ -2327,25 +2246,7 @@
>       gcc_unreachable ();
> 
>     case TYPE_SSEMOV:
>-      switch (get_attr_mode (insn))
>-	{
>-	case MODE_SI:
>-          return "%vmovd\t{%1, %0|%0, %1}";
>-	case MODE_TI:
>-	  return "%vmovdqa\t{%1, %0|%0, %1}";
>-	case MODE_XI:
>-	  return "vmovdqa32\t{%g1, %g0|%g0, %g1}";
>-
>-	case MODE_V4SF:
>-	  return "%vmovaps\t{%1, %0|%0, %1}";
>-
>-	case MODE_SF:
>-	  gcc_assert (!TARGET_AVX);
>-          return "movss\t{%1, %0|%0, %1}";
>-
>-	default:
>-	  gcc_unreachable ();
>-	}
>+      return ix86_output_ssemov (insn, operands);
> 
>     case TYPE_MMX:
>       return "pxor\t%0, %0";
>@@ -2411,10 +2312,7 @@
>      (cond [(eq_attr "alternative" "2,3")
> 	      (const_string "DI")
> 	    (eq_attr "alternative" "8,9")
>-	      (cond [(ior (match_operand 0 "ext_sse_reg_operand")
>-			  (match_operand 1 "ext_sse_reg_operand"))
>-		       (const_string "XI")
>-		     (ior (not (match_test "TARGET_SSE2"))
>+	      (cond [(ior (not (match_test "TARGET_SSE2"))
> 			  (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL"))
> 		       (const_string "V4SF")
> 		     (match_test "TARGET_AVX")
>@@ -3234,31 +3132,7 @@
>       return standard_sse_constant_opcode (insn, operands);
> 
>     case TYPE_SSEMOV:
>-      /* Handle misaligned load/store since we
>-         don't have movmisaligntf pattern. */
>-      if (misaligned_operand (operands[0], TFmode)
>-	  || misaligned_operand (operands[1], TFmode))
>-	{
>-	  if (get_attr_mode (insn) == MODE_V4SF)
>-	    return "%vmovups\t{%1, %0|%0, %1}";
>-	  else if (TARGET_AVX512VL
>-		   && (EXT_REX_SSE_REG_P (operands[0])
>-		       || EXT_REX_SSE_REG_P (operands[1])))
>-	    return "vmovdqu64\t{%1, %0|%0, %1}";
>-	  else
>-	    return "%vmovdqu\t{%1, %0|%0, %1}";
>-	}
>-      else
>-	{
>-	  if (get_attr_mode (insn) == MODE_V4SF)
>-	    return "%vmovaps\t{%1, %0|%0, %1}";
>-	  else if (TARGET_AVX512VL
>-		   && (EXT_REX_SSE_REG_P (operands[0])
>-		       || EXT_REX_SSE_REG_P (operands[1])))
>-	    return "vmovdqa64\t{%1, %0|%0, %1}";
>-	  else
>-	    return "%vmovdqa\t{%1, %0|%0, %1}";
>-	}
>+      return ix86_output_ssemov (insn, operands);
> 
>     case TYPE_MULTI:
> 	return "#";
>@@ -3411,37 +3285,7 @@
>       return standard_sse_constant_opcode (insn, operands);
> 
>     case TYPE_SSEMOV:
>-      switch (get_attr_mode (insn))
>-	{
>-	case MODE_DF:
>-	  if (TARGET_AVX && REG_P (operands[0]) && REG_P (operands[1]))
>-	    return "vmovsd\t{%d1, %0|%0, %d1}";
>-	  return "%vmovsd\t{%1, %0|%0, %1}";
>-
>-	case MODE_V4SF:
>-	  return "%vmovaps\t{%1, %0|%0, %1}";
>-	case MODE_V8DF:
>-	  return "vmovapd\t{%g1, %g0|%g0, %g1}";
>-	case MODE_V2DF:
>-	  return "%vmovapd\t{%1, %0|%0, %1}";
>-
>-	case MODE_V2SF:
>-	  gcc_assert (!TARGET_AVX);
>-	  return "movlps\t{%1, %0|%0, %1}";
>-	case MODE_V1DF:
>-	  gcc_assert (!TARGET_AVX);
>-	  return "movlpd\t{%1, %0|%0, %1}";
>-
>-	case MODE_DI:
>-	  /* Handle broken assemblers that require movd instead of movq.  */
>-	  if (!HAVE_AS_IX86_INTERUNIT_MOVQ
>-	      && (GENERAL_REG_P (operands[0]) || GENERAL_REG_P (operands[1])))
>-	    return "%vmovd\t{%1, %0|%0, %1}";
>-	  return "%vmovq\t{%1, %0|%0, %1}";
>-
>-	default:
>-	  gcc_unreachable ();
>-	}
>+      return ix86_output_ssemov (insn, operands);
> 
>     default:
>       gcc_unreachable ();
>@@ -3497,9 +3341,6 @@
> 	       (eq_attr "alternative" "12,16")
> 		 (cond [(not (match_test "TARGET_SSE2"))
> 		 	  (const_string "V4SF")
>-			(and (match_test "TARGET_AVX512F")
>-			  (not (match_test "TARGET_PREFER_AVX256")))
>-			  (const_string "XI")
> 			(match_test "TARGET_AVX")
> 			  (const_string "V2DF")
> 			(match_test "optimize_function_for_size_p (cfun)")
>@@ -3515,12 +3356,7 @@
> 
> 	       /* movaps is one byte shorter for non-AVX targets.  */
> 	       (eq_attr "alternative" "13,17")
>-		 (cond [(and (ior (not (match_test "TARGET_PREFER_AVX256"))
>-				  (not (match_test "TARGET_AVX512VL")))
>-			     (ior (match_operand 0 "ext_sse_reg_operand")
>-				  (match_operand 1 "ext_sse_reg_operand")))
>-			  (const_string "V8DF")
>-			(ior (not (match_test "TARGET_SSE2"))
>+		 (cond [(ior (not (match_test "TARGET_SSE2"))
> 			     (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL"))
> 			  (const_string "V4SF")
> 			(match_test "TARGET_SSE_PARTIAL_REG_DEPENDENCY")
>@@ -3612,24 +3448,7 @@
>       return standard_sse_constant_opcode (insn, operands);
> 
>     case TYPE_SSEMOV:
>-      switch (get_attr_mode (insn))
>-	{
>-	case MODE_SF:
>-	  if (TARGET_AVX && REG_P (operands[0]) && REG_P (operands[1]))
>-	    return "vmovss\t{%d1, %0|%0, %d1}";
>-	  return "%vmovss\t{%1, %0|%0, %1}";
>-
>-	case MODE_V16SF:
>-	  return "vmovaps\t{%g1, %g0|%g0, %g1}";
>-	case MODE_V4SF:
>-	  return "%vmovaps\t{%1, %0|%0, %1}";
>-
>-	case MODE_SI:
>-	  return "%vmovd\t{%1, %0|%0, %1}";
>-
>-	default:
>-	  gcc_unreachable ();
>-	}
>+      return ix86_output_ssemov (insn, operands);
> 
>     case TYPE_MMXMOV:
>       switch (get_attr_mode (insn))
>@@ -3702,12 +3521,7 @@
> 		  better to maintain the whole registers in single format
> 		  to avoid problems on using packed logical operations.  */
> 	       (eq_attr "alternative" "6")
>-		 (cond [(and (ior (not (match_test "TARGET_PREFER_AVX256"))
>-				  (not (match_test "TARGET_AVX512VL")))
>-			     (ior (match_operand 0 "ext_sse_reg_operand")
>-				  (match_operand 1 "ext_sse_reg_operand")))
>-			  (const_string "V16SF")
>-			(ior (match_test "TARGET_SSE_PARTIAL_REG_DEPENDENCY")
>+		 (cond [(ior (match_test "TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> 			     (match_test "TARGET_SSE_SPLIT_REGS"))
> 			  (const_string "V4SF")
> 		       ]
>diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
>index c1e0f2c411e..9c3808338d3 100644
>--- a/gcc/config/i386/mmx.md
>+++ b/gcc/config/i386/mmx.md
>@@ -115,29 +115,7 @@
>       return standard_sse_constant_opcode (insn, operands);
> 
>     case TYPE_SSEMOV:
>-      switch (get_attr_mode (insn))
>-	{
>-	case MODE_DI:
>-	  /* Handle broken assemblers that require movd instead of movq.  */
>-	  if (!HAVE_AS_IX86_INTERUNIT_MOVQ
>-	      && (GENERAL_REG_P (operands[0]) || GENERAL_REG_P (operands[1])))
>-	    return "%vmovd\t{%1, %0|%0, %1}";
>-	  return "%vmovq\t{%1, %0|%0, %1}";
>-	case MODE_TI:
>-	  return "%vmovdqa\t{%1, %0|%0, %1}";
>-	case MODE_XI:
>-	  return "vmovdqa64\t{%g1, %g0|%g0, %g1}";
>-
>-	case MODE_V2SF:
>-	  if (TARGET_AVX && REG_P (operands[0]))
>-	    return "vmovlps\t{%1, %0, %0|%0, %0, %1}";
>-	  return "%vmovlps\t{%1, %0|%0, %1}";
>-	case MODE_V4SF:
>-	  return "%vmovaps\t{%1, %0|%0, %1}";
>-
>-	default:
>-	  gcc_unreachable ();
>-	}
>+      return ix86_output_ssemov (insn, operands);
> 
>     default:
>       gcc_unreachable ();
>@@ -186,10 +164,7 @@
>      (cond [(eq_attr "alternative" "2")
> 	      (const_string "SI")
> 	    (eq_attr "alternative" "11,12")
>-	      (cond [(ior (match_operand 0 "ext_sse_reg_operand")
>-			  (match_operand 1 "ext_sse_reg_operand"))
>-			(const_string "XI")
>-		     (match_test "<MODE>mode == V2SFmode")
>+	      (cond [(match_test "<MODE>mode == V2SFmode")
> 		       (const_string "V4SF")
> 		     (ior (not (match_test "TARGET_SSE2"))
> 			  (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL"))
>diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
>index 865947debcc..99226e86436 100644
>--- a/gcc/config/i386/predicates.md
>+++ b/gcc/config/i386/predicates.md
>@@ -54,11 +54,6 @@
>   (and (match_code "reg")
>        (match_test "SSE_REGNO_P (REGNO (op))")))
> 
>-;; True if the operand is an AVX-512 new register.
>-(define_predicate "ext_sse_reg_operand"
>-  (and (match_code "reg")
>-       (match_test "EXT_REX_SSE_REGNO_P (REGNO (op))")))
>-
> ;; Return true if op is a QImode register.
> (define_predicate "any_QIreg_operand"
>   (and (match_code "reg")
>diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
>index 5dc0930ac1f..2014f0a7832 100644
>--- a/gcc/config/i386/sse.md
>+++ b/gcc/config/i386/sse.md
>@@ -982,98 +982,7 @@
>       return standard_sse_constant_opcode (insn, operands);
> 
>     case TYPE_SSEMOV:
>-      /* There is no evex-encoded vmov* for sizes smaller than 64-bytes
>-	 in avx512f, so we need to use workarounds, to access sse registers
>-	 16-31, which are evex-only. In avx512vl we don't need workarounds.  */
>-      if (TARGET_AVX512F && <MODE_SIZE> < 64 && !TARGET_AVX512VL
>-	  && (EXT_REX_SSE_REG_P (operands[0])
>-	      || EXT_REX_SSE_REG_P (operands[1])))
>-	{
>-	  if (memory_operand (operands[0], <MODE>mode))
>-	    {
>-	      if (<MODE_SIZE> == 32)
>-		return "vextract<shuffletype>64x4\t{$0x0, %g1, %0|%0, %g1, 0x0}";
>-	      else if (<MODE_SIZE> == 16)
>-		return "vextract<shuffletype>32x4\t{$0x0, %g1, %0|%0, %g1, 0x0}";
>-	      else
>-		gcc_unreachable ();
>-	    }
>-	  else if (memory_operand (operands[1], <MODE>mode))
>-	    {
>-	      if (<MODE_SIZE> == 32)
>-		return "vbroadcast<shuffletype>64x4\t{%1, %g0|%g0, %1}";
>-	      else if (<MODE_SIZE> == 16)
>-		return "vbroadcast<shuffletype>32x4\t{%1, %g0|%g0, %1}";
>-	      else
>-		gcc_unreachable ();
>-	    }
>-	  else
>-	    /* Reg -> reg move is always aligned.  Just use wider move.  */
>-	    switch (get_attr_mode (insn))
>-	      {
>-	      case MODE_V8SF:
>-	      case MODE_V4SF:
>-		return "vmovaps\t{%g1, %g0|%g0, %g1}";
>-	      case MODE_V4DF:
>-	      case MODE_V2DF:
>-		return "vmovapd\t{%g1, %g0|%g0, %g1}";
>-	      case MODE_OI:
>-	      case MODE_TI:
>-		return "vmovdqa64\t{%g1, %g0|%g0, %g1}";
>-	      default:
>-		gcc_unreachable ();
>-	      }
>-	}
>-
>-      switch (get_attr_mode (insn))
>-	{
>-	case MODE_V16SF:
>-	case MODE_V8SF:
>-	case MODE_V4SF:
>-	  if (misaligned_operand (operands[0], <MODE>mode)
>-	      || misaligned_operand (operands[1], <MODE>mode))
>-	    return "%vmovups\t{%1, %0|%0, %1}";
>-	  else
>-	    return "%vmovaps\t{%1, %0|%0, %1}";
>-
>-	case MODE_V8DF:
>-	case MODE_V4DF:
>-	case MODE_V2DF:
>-	  if (misaligned_operand (operands[0], <MODE>mode)
>-	      || misaligned_operand (operands[1], <MODE>mode))
>-	    return "%vmovupd\t{%1, %0|%0, %1}";
>-	  else
>-	    return "%vmovapd\t{%1, %0|%0, %1}";
>-
>-	case MODE_OI:
>-	case MODE_TI:
>-	  if (misaligned_operand (operands[0], <MODE>mode)
>-	      || misaligned_operand (operands[1], <MODE>mode))
>-	    return TARGET_AVX512VL
>-		   && (<MODE>mode == V4SImode
>-		       || <MODE>mode == V2DImode
>-		       || <MODE>mode == V8SImode
>-		       || <MODE>mode == V4DImode
>-		       || TARGET_AVX512BW)
>-		   ? "vmovdqu<ssescalarsize>\t{%1, %0|%0, %1}"
>-		   : "%vmovdqu\t{%1, %0|%0, %1}";
>-	  else
>-	    return TARGET_AVX512VL ? "vmovdqa64\t{%1, %0|%0, %1}"
>-				   : "%vmovdqa\t{%1, %0|%0, %1}";
>-	case MODE_XI:
>-	  if (misaligned_operand (operands[0], <MODE>mode)
>-	      || misaligned_operand (operands[1], <MODE>mode))
>-	    return (<MODE>mode == V16SImode
>-		    || <MODE>mode == V8DImode
>-		    || TARGET_AVX512BW)
>-		   ? "vmovdqu<ssescalarsize>\t{%1, %0|%0, %1}"
>-		   : "vmovdqu64\t{%1, %0|%0, %1}";
>-	  else
>-	    return "vmovdqa64\t{%1, %0|%0, %1}";
>-
>-	default:
>-	  gcc_unreachable ();
>-	}
>+      return ix86_output_ssemov (insn, operands);
> 
>     default:
>       gcc_unreachable ();
>@@ -1082,10 +991,7 @@
>   [(set_attr "type" "sselog1,sselog1,ssemov,ssemov")
>    (set_attr "prefix" "maybe_vex")
>    (set (attr "mode")
>-	(cond [(and (eq_attr "alternative" "1")
>-		    (match_test "TARGET_AVX512VL"))
>-		 (const_string "<sseinsnmode>")
>-	       (and (match_test "<MODE_SIZE> == 16")
>+	(cond [(and (match_test "<MODE_SIZE> == 16")
> 		    (ior (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")
> 			 (and (eq_attr "alternative" "3")
> 			      (match_test "TARGET_SSE_TYPELESS_STORES"))))
>diff --git a/gcc/testsuite/gcc.target/i386/pr89229-2a.c b/gcc/testsuite/gcc.target/i386/pr89229-2a.c
>new file mode 100644
>index 00000000000..0cf78039481
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/i386/pr89229-2a.c
>@@ -0,0 +1,15 @@
>+/* { dg-do compile { target { ! ia32 } } } */
>+/* { dg-options "-O2 -march=skylake-avx512" } */
>+
>+typedef __int128 __m128t __attribute__ ((__vector_size__ (16),
>+					 __may_alias__));
>+
>+__m128t
>+foo1 (void)
>+{
>+  register __int128 xmm16 __asm ("xmm16") = (__int128) -1;
>+  asm volatile ("" : "+v" (xmm16));
>+  return (__m128t) xmm16;
>+}
>+
>+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
>diff --git a/gcc/testsuite/gcc.target/i386/pr89229-2b.c b/gcc/testsuite/gcc.target/i386/pr89229-2b.c
>new file mode 100644
>index 00000000000..8d5d6c41d30
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/i386/pr89229-2b.c
>@@ -0,0 +1,13 @@
>+/* { dg-do compile { target { ! ia32 } } } */
>+/* { dg-options "-O2 -march=skylake-avx512 -mno-avx512vl" } */
>+
>+typedef __int128 __m128t __attribute__ ((__vector_size__ (16),
>+					 __may_alias__));
>+
>+__m128t
>+foo1 (void)
>+{
>+  register __int128 xmm16 __asm ("xmm16") = (__int128) -1; /* { dg-error "register specified for 'xmm16'" } */
>+  asm volatile ("" : "+v" (xmm16));
>+  return (__m128t) xmm16;
>+}
>diff --git a/gcc/testsuite/gcc.target/i386/pr89229-2c.c b/gcc/testsuite/gcc.target/i386/pr89229-2c.c
>new file mode 100644
>index 00000000000..218da46dcd0
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/i386/pr89229-2c.c
>@@ -0,0 +1,6 @@
>+/* { dg-do compile { target { ! ia32 } } } */
>+/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */
>+
>+#include "pr89229-2a.c"
>+
>+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
>diff --git a/gcc/testsuite/gcc.target/i386/pr89229-3a.c b/gcc/testsuite/gcc.target/i386/pr89229-3a.c
>new file mode 100644
>index 00000000000..fd56f447016
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/i386/pr89229-3a.c
>@@ -0,0 +1,17 @@
>+/* { dg-do compile { target { ! ia32 } } } */
>+/* { dg-options "-O2 -march=skylake-avx512" } */
>+
>+extern int i;
>+
>+int
>+foo1 (void)
>+{
>+  register int xmm16 __asm ("xmm16") = i;
>+  asm volatile ("" : "+v" (xmm16));
>+  register int xmm17 __asm ("xmm17") = xmm16;
>+  asm volatile ("" : "+v" (xmm17));
>+  return xmm17;
>+}
>+
>+/* { dg-final { scan-assembler-times "vmovdqa32\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */
>+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
>diff --git a/gcc/testsuite/gcc.target/i386/pr89229-3b.c b/gcc/testsuite/gcc.target/i386/pr89229-3b.c
>new file mode 100644
>index 00000000000..9265fc0354b
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/i386/pr89229-3b.c
>@@ -0,0 +1,6 @@
>+/* { dg-do compile { target { ! ia32 } } } */
>+/* { dg-options "-O2 -march=skylake-avx512 -mno-avx512vl" } */
>+
>+#include "pr89229-3a.c"
>+
>+/* { dg-final { scan-assembler-times "vmovdqa32\[^\n\r]*zmm1\[67]\[^\n\r]*zmm1\[67]" 1 } } */
>diff --git a/gcc/testsuite/gcc.target/i386/pr89229-3c.c b/gcc/testsuite/gcc.target/i386/pr89229-3c.c
>new file mode 100644
>index 00000000000..d3fdf1ee273
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/i386/pr89229-3c.c
>@@ -0,0 +1,7 @@
>+/* { dg-do compile { target { ! ia32 } } } */
>+/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */
>+
>+#include "pr89229-3a.c"
>+
>+/* { dg-final { scan-assembler-times "vmovdqa32\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */
>+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
>diff --git a/gcc/testsuite/gcc.target/i386/pr89229-4a.c b/gcc/testsuite/gcc.target/i386/pr89229-4a.c
>new file mode 100644
>index 00000000000..cb9b071e873
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/i386/pr89229-4a.c
>@@ -0,0 +1,17 @@
>+/* { dg-do compile { target { ! ia32 } } } */
>+/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */
>+
>+extern long long i;
>+
>+long long
>+foo1 (void)
>+{
>+  register long long xmm16 __asm ("xmm16") = i;
>+  asm volatile ("" : "+v" (xmm16));
>+  register long long xmm17 __asm ("xmm17") = xmm16;
>+  asm volatile ("" : "+v" (xmm17));
>+  return xmm17;
>+}
>+
>+/* { dg-final { scan-assembler-times "vmovdqa64\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */
>+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
>diff --git a/gcc/testsuite/gcc.target/i386/pr89229-4b.c b/gcc/testsuite/gcc.target/i386/pr89229-4b.c
>new file mode 100644
>index 00000000000..023e81253a0
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/i386/pr89229-4b.c
>@@ -0,0 +1,6 @@
>+/* { dg-do compile { target { ! ia32 } } } */
>+/* { dg-options "-O2 -march=skylake-avx512 -mno-avx512vl" } */
>+
>+#include "pr89229-4a.c"
>+
>+/* { dg-final { scan-assembler-times "vmovdqa32\[^\n\r]*zmm1\[67]\[^\n\r]*zmm1\[67]" 1 } } */
>diff --git a/gcc/testsuite/gcc.target/i386/pr89229-4c.c b/gcc/testsuite/gcc.target/i386/pr89229-4c.c
>new file mode 100644
>index 00000000000..e02eb37c16d
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/i386/pr89229-4c.c
>@@ -0,0 +1,7 @@
>+/* { dg-do compile { target { ! ia32 } } } */
>+/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */
>+
>+#include "pr89229-4a.c"
>+
>+/* { dg-final { scan-assembler-times "vmovdqa64\[^\n\r]*xmm1\[67]\[^\n\r]*xmm1\[67]" 1 } } */
>+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
>diff --git a/gcc/testsuite/gcc.target/i386/pr89229-5a.c b/gcc/testsuite/gcc.target/i386/pr89229-5a.c
>new file mode 100644
>index 00000000000..856115b2f5a
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/i386/pr89229-5a.c
>@@ -0,0 +1,16 @@
>+/* { dg-do compile { target { ! ia32 } } } */
>+/* { dg-options "-O2 -march=skylake-avx512" } */
>+
>+extern float d;
>+
>+void
>+foo1 (float x)
>+{
>+  register float xmm16 __asm ("xmm16") = x;
>+  asm volatile ("" : "+v" (xmm16));
>+  register float xmm17 __asm ("xmm17") = xmm16;
>+  asm volatile ("" : "+v" (xmm17));
>+  d = xmm17;
>+}
>+
>+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
>diff --git a/gcc/testsuite/gcc.target/i386/pr89229-5b.c b/gcc/testsuite/gcc.target/i386/pr89229-5b.c
>new file mode 100644
>index 00000000000..cb0f3b55ccc
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/i386/pr89229-5b.c
>@@ -0,0 +1,6 @@
>+/* { dg-do compile { target { ! ia32 } } } */
>+/* { dg-options "-O2 -march=skylake-avx512 -mno-avx512vl" } */
>+
>+#include "pr89229-5a.c"
>+
>+/* { dg-final { scan-assembler-times "vmovaps\[^\n\r]*zmm1\[67]\[^\n\r]*zmm1\[67]" 1 } } */
>diff --git a/gcc/testsuite/gcc.target/i386/pr89229-5c.c b/gcc/testsuite/gcc.target/i386/pr89229-5c.c
>new file mode 100644
>index 00000000000..529a520133c
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/i386/pr89229-5c.c
>@@ -0,0 +1,6 @@
>+/* { dg-do compile { target { ! ia32 } } } */
>+/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */
>+
>+#include "pr89229-5a.c"
>+
>+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
>diff --git a/gcc/testsuite/gcc.target/i386/pr89229-6a.c b/gcc/testsuite/gcc.target/i386/pr89229-6a.c
>new file mode 100644
>index 00000000000..f88d7c8d74c
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/i386/pr89229-6a.c
>@@ -0,0 +1,16 @@
>+/* { dg-do compile { target { ! ia32 } } } */
>+/* { dg-options "-O2 -march=skylake-avx512" } */
>+
>+extern double d;
>+
>+void
>+foo1 (double x)
>+{
>+  register double xmm16 __asm ("xmm16") = x;
>+  asm volatile ("" : "+v" (xmm16));
>+  register double xmm17 __asm ("xmm17") = xmm16;
>+  asm volatile ("" : "+v" (xmm17));
>+  d = xmm17;
>+}
>+
>+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
>diff --git a/gcc/testsuite/gcc.target/i386/pr89229-6b.c b/gcc/testsuite/gcc.target/i386/pr89229-6b.c
>new file mode 100644
>index 00000000000..316d85d921e
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/i386/pr89229-6b.c
>@@ -0,0 +1,6 @@
>+/* { dg-do compile { target { ! ia32 } } } */
>+/* { dg-options "-O2 -march=skylake-avx512 -mno-avx512vl" } */
>+
>+#include "pr89229-6a.c"
>+
>+/* { dg-final { scan-assembler-times "vmovapd\[^\n\r]*zmm1\[67]\[^\n\r]*zmm1\[67]" 1 } } */
>diff --git a/gcc/testsuite/gcc.target/i386/pr89229-6c.c b/gcc/testsuite/gcc.target/i386/pr89229-6c.c
>new file mode 100644
>index 00000000000..7a4d254670c
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/i386/pr89229-6c.c
>@@ -0,0 +1,6 @@
>+/* { dg-do compile { target { ! ia32 } } } */
>+/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */
>+
>+#include "pr89229-6a.c"
>+
>+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
>diff --git a/gcc/testsuite/gcc.target/i386/pr89229-7a.c b/gcc/testsuite/gcc.target/i386/pr89229-7a.c
>new file mode 100644
>index 00000000000..fcb85c366b6
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/i386/pr89229-7a.c
>@@ -0,0 +1,16 @@
>+/* { dg-do compile { target { ! ia32 } } } */
>+/* { dg-options "-O2 -march=skylake-avx512" } */
>+
>+extern __float128 d;
>+
>+void
>+foo1 (__float128 x)
>+{
>+  register __float128 xmm16 __asm ("xmm16") = x;
>+  asm volatile ("" : "+v" (xmm16));
>+  register __float128 xmm17 __asm ("xmm17") = xmm16;
>+  asm volatile ("" : "+v" (xmm17));
>+  d = xmm17;
>+}
>+
>+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
>diff --git a/gcc/testsuite/gcc.target/i386/pr89229-7b.c b/gcc/testsuite/gcc.target/i386/pr89229-7b.c
>new file mode 100644
>index 00000000000..37eb83c783b
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/i386/pr89229-7b.c
>@@ -0,0 +1,12 @@
>+/* { dg-do compile { target { ! ia32 } } } */
>+/* { dg-options "-O2 -march=skylake-avx512 -mno-avx512vl" } */
>+
>+extern __float128 d;
>+
>+void
>+foo1 (__float128 x)
>+{
>+  register __float128 xmm16 __asm ("xmm16") = x; /* { dg-error "register specified for 'xmm16'" } */
>+  asm volatile ("" : "+v" (xmm16));
>+  d = xmm16;
>+}
>diff --git a/gcc/testsuite/gcc.target/i386/pr89229-7c.c b/gcc/testsuite/gcc.target/i386/pr89229-7c.c
>new file mode 100644
>index 00000000000..e37ff2bf5bd
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/i386/pr89229-7c.c
>@@ -0,0 +1,6 @@
>+/* { dg-do compile { target { ! ia32 } } } */
>+/* { dg-options "-O2 -march=skylake-avx512 -mprefer-vector-width=512" } */
>+
>+#include "pr89229-7a.c"
>+
>+/* { dg-final { scan-assembler-not "%zmm\[0-9\]+" } } */
>-- 
>2.20.1
>
Comment 25 Jakub Jelinek 2019-05-03 09:16:31 UTC
GCC 9.1 has been released.
Comment 26 Jakub Jelinek 2019-08-12 08:55:31 UTC
GCC 9.2 has been released.
Comment 27 H.J. Lu 2020-01-27 19:01:58 UTC
*** Bug 89346 has been marked as a duplicate of this bug. ***
Comment 28 GCC Commits 2020-03-06 00:53:05 UTC
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>:

https://gcc.gnu.org/g:5358e8f5800daa0012fc9d06705d64bbb21fa07b

commit r10-7054-g5358e8f5800daa0012fc9d06705d64bbb21fa07b
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Mar 5 16:45:05 2020 -0800

    i386: Properly encode vector registers in vector move
    
    On x86, when AVX and AVX512 are enabled, vector move instructions can
    be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512):
    
       0:	c5 f9 6f d1          	vmovdqa %xmm1,%xmm2
       4:	62 f1 fd 08 6f d1    	vmovdqa64 %xmm1,%xmm2
    
    We prefer VEX encoding over EVEX since VEX is shorter.  Also AVX512F
    only supports 512-bit vector moves.  AVX512F + AVX512VL supports 128-bit
    and 256-bit vector moves.  xmm16-xmm31 and ymm16-ymm31 are disallowed in
    128-bit and 256-bit modes when AVX512VL is disabled.  Mode attributes on
    x86 vector move patterns indicate target preferences of vector move
    encoding.  For scalar register to register move, we can use 512-bit
    vector move instructions to move 32-bit/64-bit scalar if AVX512VL isn't
    available.  With AVX512F and AVX512VL, we should use VEX encoding for
    128-bit/256-bit vector moves if upper 16 vector registers aren't used.
    This patch adds a function, ix86_output_ssemov, to generate vector moves:
    
    1. If zmm registers are used, use EVEX encoding.
    2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding
    will be generated.
    3. If xmm16-xmm31/ymm16-ymm31 registers are used:
       a. With AVX512VL, AVX512VL vector moves will be generated.
       b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
          move will be done with zmm register move.
    
    There is no need to set mode attribute to XImode explicitly since
    ix86_output_ssemov can properly encode xmm16-xmm31/ymm16-ymm31 registers
    with and without AVX512VL.
    
    Tested on AVX2 and AVX512 with and without --with-arch=native.
    
    gcc/
    
    	PR target/89229
    	PR target/89346
    	* config/i386/i386-protos.h (ix86_output_ssemov): New prototype.
    	* config/i386/i386.c (ix86_get_ssemov): New function.
    	(ix86_output_ssemov): Likewise.
    	* config/i386/sse.md (VMOVE:mov<mode>_internal): Call
    	ix86_output_ssemov for TYPE_SSEMOV.  Remove TARGET_AVX512VL
    	check.
    	(*movxi_internal_avx512f): Call ix86_output_ssemov for TYPE_SSEMOV.
    	(*movoi_internal_avx): Call ix86_output_ssemov for TYPE_SSEMOV.
    	Remove ext_sse_reg_operand and TARGET_AVX512VL check.
    	(*movti_internal): Likewise.
    	(*movtf_internal): Call ix86_output_ssemov for TYPE_SSEMOV.
    
    gcc/testsuite/
    
    	PR target/89229
    	PR target/89346
    	* gcc.target/i386/avx512vl-vmovdqa64-1.c: Updated.
    	* gcc.target/i386/pr89229-2a.c: New test.
    	* gcc.target/i386/pr89229-2b.c: Likewise.
    	* gcc.target/i386/pr89229-2c.c: Likewise.
    	* gcc.target/i386/pr89229-3a.c: Likewise.
    	* gcc.target/i386/pr89229-3b.c: Likewise.
    	* gcc.target/i386/pr89229-3c.c: Likewise.
    	* gcc.target/i386/pr89346.c: Likewise.
Comment 29 Martin Liška 2020-03-11 12:07:20 UTC
commit r10-7078-g6733ecaf3fe77871d86bfb36bcda5497ae2aaba7
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun Mar 8 05:01:03 2020 -0700

    gcc.target/i386/pr89229-3c.c: Include "pr89229-3a.c"
    
            PR target/89229
            PR target/89346
            * gcc.target/i386/pr89229-3c.c: Include "pr89229-3a.c", instead
            of "pr89229-5a.c".
Comment 30 Martin Liška 2020-03-12 12:38:16 UTC
commit r10-7143-g54f46d82f54ba7a4110cef102b7c18eaf8b4b6bd
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Mar 12 03:47:45 2020 -0700

    i386: Use ix86_output_ssemov for MMX TYPE_SSEMOV
    
    There is no need to set mode attribute to XImode since ix86_output_ssemov
    can properly encode xmm16-xmm31 registers with and without AVX512VL.
    
            PR target/89229
            * config/i386/i386.c (ix86_output_ssemov): Handle MODE_DI,
            MODE_V1DF and MODE_V2SF.
            * config/i386/mmx.md (MMXMODE:*mov<mode>_internal): Call
            ix86_output_ssemov for TYPE_SSEMOV.  Remove ext_sse_reg_operand
            check.
Comment 31 Martin Liška 2020-03-13 10:28:48 UTC
commit r10-7154-gfd8679974b2ded884ffd7d912efef7fe13e4ff4f
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Mar 13 02:48:59 2020 -0700

    i386: Use ix86_output_ssemov for DFmode TYPE_SSEMOV
    
    There is no need to set mode attribute to XImode nor V8DFmode since
    ix86_output_ssemov can properly encode xmm16-xmm31 registers with and
    without AVX512VL.
    
    gcc/
    
            PR target/89229
            * config/i386/i386.c (ix86_output_ssemov): Handle MODE_DF.
            * config/i386/i386.md (*movdf_internal): Call ix86_output_ssemov
            for TYPE_SSEMOV.  Remove TARGET_AVX512F, TARGET_PREFER_AVX256,
            TARGET_AVX512VL and ext_sse_reg_operand check.
    
    gcc/testsuite/
    
            PR target/89229
            * gcc.target/i386/pr89229-4a.c: New test.
            * gcc.target/i386/pr89229-4b.c: Likewise.
            * gcc.target/i386/pr89229-4c.c: Likewise.
Comment 32 GCC Commits 2020-03-14 23:12:48 UTC
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>:

https://gcc.gnu.org/g:824722e45f80b22e2f035a61300f494b2a10d6f4

commit r10-7177-g824722e45f80b22e2f035a61300f494b2a10d6f4
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sat Mar 14 16:06:55 2020 -0700

    i386: Use ix86_output_ssemov for DImode TYPE_SSEMOV
    
    There is no need to set mode attribute to XImode since ix86_output_ssemov
    can properly encode xmm16-xmm31 registers with and without AVX512VL.
    
    gcc/
    
            PR target/89229
            * config/i386/i386.md (*movdi_internal): Call ix86_output_ssemov
            for TYPE_SSEMOV.  Remove ext_sse_reg_operand and TARGET_AVX512VL
            check.
    
    gcc/testsuite/
    
            PR target/89229
            * gcc.target/i386/pr89229-5a.c: New test.
            * gcc.target/i386/pr89229-5b.c: Likewise.
            * gcc.target/i386/pr89229-5c.c: Likewise.
Comment 33 GCC Commits 2020-03-15 17:23:03 UTC
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>:

https://gcc.gnu.org/g:9d74caf21be7025db8fef997e87ebf3b85acaf4a

commit r10-7182-g9d74caf21be7025db8fef997e87ebf3b85acaf4a
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sun Mar 15 10:21:08 2020 -0700

    i386: Use ix86_output_ssemov for SFmode TYPE_SSEMOV
    
    There is no need to set mode attribute to V16SFmode since ix86_output_ssemov
    can properly encode xmm16-xmm31 registers with and without AVX512VL.
    
    gcc/
    
            PR target/89229
            * config/i386/i386.c (ix86_output_ssemov): Handle MODE_SI and
            MODE_SF.
            * config/i386/i386.md (*movsf_internal): Call ix86_output_ssemov
            for TYPE_SSEMOV.  Remove TARGET_PREFER_AVX256, TARGET_AVX512VL
            and ext_sse_reg_operand check.
    
    gcc/testsuite/
    
            PR target/89229
            * gcc.target/i386/pr89229-6a.c: New test.
            * gcc.target/i386/pr89229-6b.c: Likewise.
            * gcc.target/i386/pr89229-6c.c: Likewise.
Comment 34 GCC Commits 2020-03-16 10:52:53 UTC
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>:

https://gcc.gnu.org/g:5a3c42b227bbe9e7acb5335088d2255262311bd8

commit r10-7189-g5a3c42b227bbe9e7acb5335088d2255262311bd8
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Mon Mar 16 03:48:55 2020 -0700

    i386: Use ix86_output_ssemov for SImode TYPE_SSEMOV
    
    There is no need to set mode attribute to XImode since ix86_output_ssemov
    can properly encode xmm16-xmm31 registers with and without AVX512VL.
    
    Remove ext_sse_reg_operand since it is no longer needed.
    
    gcc/
    
            PR target/89229
            * config/i386/i386.md (*movsi_internal): Call ix86_output_ssemov
            for TYPE_SSEMOV.  Remove ext_sse_reg_operand and TARGET_AVX512VL
            check.
            * config/i386/predicates.md (ext_sse_reg_operand): Removed.
    
    gcc/testsuite/
    
            PR target/89229
            * gcc.target/i386/pr89229-7a.c: New test.
            * gcc.target/i386/pr89229-7b.c: Likewise.
            * gcc.target/i386/pr89229-7c.c: Likewise.
Comment 35 H.J. Lu 2020-03-16 10:54:04 UTC
Fixed for GCC 10.