Bug 92658

Summary: x86 lacks vector extend / truncate
Product: gcc Reporter: Richard Biener <rguenth>
Component: targetAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: enhancement CC: crazylht, mark, rsandifo, ubizjak
Priority: P3 Keywords: easyhack, missed-optimization
Version: 10.0   
Target Milestone: ---   
Host: Target: x86_64-*-*, i?86-*-*
Build: Known to work:
Known to fail: Last reconfirmed: 2020-01-30 00:00:00
Bug Depends on: 95201    
Bug Blocks: 92492, 92611, 92645    
Attachments: Prototype patch
Prototype patch v2
Test cases
0001-Add-missing-vector-truncmn2-expanders-PR92658.patch

Description Richard Biener 2019-11-25 13:38:32 UTC
After

2019-11-14  Richard Sandiford  <richard.sandiford@arm.com>

        * tree-cfg.c (verify_gimple_assign_unary): Handle conversions
        between vector types.
        * tree-vect-stmts.c (vectorizable_conversion): Extend the
        non-widening and non-narrowing path to handle standard
        conversion codes, if the target supports them.
        * expr.c (convert_move): Try using the extend and truncate optabs
        for vectors.
        * optabs-tree.c (supportable_convert_operation): Likewise.
        * config/aarch64/iterators.md (Vnarroqw): New iterator.
        * config/aarch64/aarch64-simd.md (<optab><Vnarrowq><mode>2)
        (trunc<mode><Vnarrowq>2): New patterns.

it would now be possible to BB vectorize the following but the x86 backend
lacks appropriate patterns for the vector mode variants (here V8HI -> V8QI
narrowing).

typedef unsigned char v16qi __attribute__((vector_size(16)));
typedef unsigned short v8hi __attribute__((vector_size(16)));

void bar (v8hi *dst, v16qi * __restrict src)
{
  unsigned short tem[8];
  tem[0] = (*src)[0];
  tem[1] = (*src)[1];
  tem[2] = (*src)[2];
  tem[3] = (*src)[3];
  tem[4] = (*src)[4];
  tem[5] = (*src)[5];
  tem[6] = (*src)[6];
  tem[7] = (*src)[7];
  dst[0] = *(v8hi *)tem;
}
void foo (v8hi *dst, v16qi src)
{
  unsigned short tem[8];
  tem[0] = src[0];
  tem[1] = src[1];
  tem[2] = src[2];
  tem[3] = src[3];
  tem[4] = src[4];
  tem[5] = src[5];
  tem[6] = src[6];
  tem[7] = src[7];
  dst[0] = *(v8hi *)tem;
}
Comment 1 Richard Biener 2019-11-25 14:07:15 UTC
It looks like there are already some avx512 patterns matching this but they
are not visible to the RTL expanders.

(define_insn "zero_extendv8qiv8hi2"
  [(set (match_operand:V8HI 0 "register_operand" "=x,v")
        (zero_extend:V8HI (match_operand:V8QI 1 "register_operand" "x,v")))]
  "TARGET_SSE2"
  "@
   punpcklbw\t{%1, %0|%0, %1}
   vpunpcklbw\t{%1, %0|%0, %1}"
)

does it for the testcase pasted but truncation and other extends are also
missing, likewise for floats.
Comment 2 Uroš Bizjak 2020-02-27 19:37:51 UTC
Created attachment 47924 [details]
Prototype patch

Prototype patch that introduces missing expanders to generate PMOVZX and PMOVSX packed moves. For the testcase in Comment #0, we generate (-O3 -mavx):

bar:
        vpmovzxbw       (%rsi), %xmm0
        vmovdqa %xmm0, (%rdi)
        ret

foo:
        vpmovzxbw       %xmm0, %xmm0
        vmovdqa %xmm0, (%rdi)
        ret
Comment 3 Uroš Bizjak 2020-02-28 08:43:27 UTC
Richi, should the following test also vectorize?

--cut here--
typedef unsigned char v16qi __attribute__((vector_size (16)));
typedef unsigned int v4si __attribute__((vector_size (16)));

void
foo_u8_u32 (v4si * dst, v16qi * __restrict src)
{
  unsigned int tem[4];
  tem[0] = (*src)[0];
  tem[1] = (*src)[1];
  tem[2] = (*src)[2];
  tem[3] = (*src)[3];
  dst[0] = *(v4si *) tem;
}

void
bar_u8_u32 (v4si * dst, v16qi src)
{
  unsigned int tem[4];
  tem[0] = src[0];
  tem[1] = src[1];
  tem[2] = src[2];
  tem[3] = src[3];
  dst[0] = *(v4si *) tem;
}
--cut here--
Comment 4 Uroš Bizjak 2020-02-28 12:37:15 UTC
(In reply to Uroš Bizjak from comment #3)
> Richi, should the following test also vectorize?

It doesn't vectorize because supportable_convert_operation returns false for:

(gdb) p debug_generic_expr (vectype_out)
vector(4) unsigned int
$1 = void
(gdb) p debug_generic_expr (vectype_in)
vector(4) unsigned char
$2 = void

and TYPE_MODE for "vector(4) unsigned char" returns E_SImode:

(gdb) p m1
$3 = E_V4SImode
(gdb) p m2
$4 = E_SImode

and:

  if (!VECTOR_MODE_P (m1) || !VECTOR_MODE_P (m2))
    return false;
Comment 5 Uroš Bizjak 2020-02-28 12:42:37 UTC
Created attachment 47927 [details]
Prototype patch v2

A couple of typos fixed.

Still doesn't vectorize v4qi->v4si, v2qi->v2di, v2hi->v2di and v4qi->v4di.

Looks like tree-opt issue, as explained in Comment #4.
Comment 6 Uroš Bizjak 2020-02-28 12:58:18 UTC
Created attachment 47928 [details]
Test cases

sse4, avx2 and avx512bw test cases.

Fails:

FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2
Comment 7 Uroš Bizjak 2020-02-28 13:03:46 UTC
CC author of the generic functionality part.
Comment 8 rguenther@suse.de 2020-03-02 07:52:27 UTC
On Fri, 28 Feb 2020, ubizjak at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92658
> 
> --- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
> Richi, should the following test also vectorize?

In priciple yes.  I see both cases "vectorized" to a store
from a CTOR but then my primitive pattern matching in forwprop
not applying because supportable_convert_operation is confused
about a vector(4) char type having SImode and bailing out at the

292       if (!VECTOR_MODE_P (m1) || !VECTOR_MODE_P (m2))
293         return false;

check.  So it looks like with just SSE2 the backend doesn't
consider V4QImode a supported vector type.  I'm not sure we
want to fix that but then this means regular optab checks
won't do it here.

Interesting cases nevertheless.

Would those conversions map to good assembly?

> --cut here--
> typedef unsigned char v16qi __attribute__((vector_size (16)));
> typedef unsigned int v4si __attribute__((vector_size (16)));
> 
> void
> foo_u8_u32 (v4si * dst, v16qi * __restrict src)
> {
>   unsigned int tem[4];
>   tem[0] = (*src)[0];
>   tem[1] = (*src)[1];
>   tem[2] = (*src)[2];
>   tem[3] = (*src)[3];
>   dst[0] = *(v4si *) tem;
> }
> 
> void
> bar_u8_u32 (v4si * dst, v16qi src)
> {
>   unsigned int tem[4];
>   tem[0] = src[0];
>   tem[1] = src[1];
>   tem[2] = src[2];
>   tem[3] = src[3];
>   dst[0] = *(v4si *) tem;
> }
> --cut here--
Comment 9 Uroš Bizjak 2020-03-02 08:24:26 UTC
(In reply to rguenther@suse.de from comment #8)
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92658
> > 
> > --- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
> > Richi, should the following test also vectorize?
> 
> In priciple yes.  I see both cases "vectorized" to a store
> from a CTOR but then my primitive pattern matching in forwprop
> not applying because supportable_convert_operation is confused
> about a vector(4) char type having SImode and bailing out at the
> 
> 292       if (!VECTOR_MODE_P (m1) || !VECTOR_MODE_P (m2))
> 293         return false;
> 
> check.  So it looks like with just SSE2 the backend doesn't
> consider V4QImode a supported vector type.  I'm not sure we
> want to fix that but then this means regular optab checks
> won't do it here.
> 
> Interesting cases nevertheless.
> 
> Would those conversions map to good assembly?

Yes, please see attached test cases, especially pr92658-sse4.c. We have pmovzxbd for v4qi->v4si, pmovzxbq for v2qi->v2di, and pmovzxwd for v2hi->v2di, in addition to AVX2 pmovzxbq for v4qi->v4di. All testcases should vectorize with a single instruction. It is also possible to introduce additional conversions to 64bit vectors (e.g. v2qi->v2si) for TARGET_MMX_WITH_SSE targets.

Please note that there are similar sign_extend patterns and corresponding truncate patterns with AVX512F, currently not covered by attached patch and testcases.
Comment 10 Uroš Bizjak 2020-05-14 12:50:29 UTC
The patch is ready to be pushed, it is waiting for a decision what to do with failed cases.

Richi, should this patch move forward (eventually XFAILing failed cases), or do you plan to look at the fails from the generic vectorizer POV?
Comment 11 rguenther@suse.de 2020-05-14 13:20:38 UTC
On Thu, 14 May 2020, ubizjak at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92658
> 
> --- Comment #10 from Uroš Bizjak <ubizjak at gmail dot com> ---
> The patch is ready to be pushed, it is waiting for a decision what to do with
> failed cases.
> 
> Richi, should this patch move forward (eventually XFAILing failed cases), or do
> you plan to look at the fails from the generic vectorizer POV?

I think we should go forward with the patch, either XFAILing the testcases
or splitting out the testcase (and backend patterns that do not get
exercised due to the issue).

I've already said in comment#8 that the issue here is optabs working
with modes and not vector types, so it's a bit hard to use the
same mechanism to deal with the currently failing cases.  One
possible route would be to add V4QImode similar to how we now
do V2SFmode with SSE but of course where do we stop ...

As said we can try to tackle this incrementally.  Maybe Richard also
has input here?
Comment 12 Richard Sandiford 2020-05-14 13:41:14 UTC
(In reply to rguenther@suse.de from comment #11)
> On Thu, 14 May 2020, ubizjak at gmail dot com wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92658
> > 
> > --- Comment #10 from Uroš Bizjak <ubizjak at gmail dot com> ---
> > The patch is ready to be pushed, it is waiting for a decision what to do with
> > failed cases.
> > 
> > Richi, should this patch move forward (eventually XFAILing failed cases), or do
> > you plan to look at the fails from the generic vectorizer POV?
> 
> I think we should go forward with the patch, either XFAILing the testcases
> or splitting out the testcase (and backend patterns that do not get
> exercised due to the issue).
> 
> I've already said in comment#8 that the issue here is optabs working
> with modes and not vector types, so it's a bit hard to use the
> same mechanism to deal with the currently failing cases.  One
> possible route would be to add V4QImode similar to how we now
> do V2SFmode with SSE but of course where do we stop ...
> 
> As said we can try to tackle this incrementally.  Maybe Richard also
> has input here?
Nothing to add really, but: yeah, the idea was that the target
would provide smaller vector modes if it can efficiently load,
store and (at least) add them.  I think it would be good to do
that for aarch64 Advanced SIMD at some point.

There are three points behind using vector modes for this:

1. It gives the target the flexibility to decide how best to
   implement the operation (All at one end, and if so, which end?
   Or spread out evenly across the vector?)  On aarch64, the
   choice would differ between SVE and Advanced SIMD.

2. It means that __attribute__((vector_size)) vectors benefit.

3. The smaller modes can be used in conjunction with
   autovectorize_vector_modes to give the loop vectoriser
   a choice between operating on packed or unpacked vectors.
   This is mostly useful for VECT_COMPARE_COSTS but can help
   with low-trip-count loops even without.

So yeah, defining V4QI sounds like the way to go if the
architecture can process it efficiently.  I get the "where
do we stop" concern, but .md mode iterators should help.
Comment 13 Hongtao.liu 2020-05-15 08:18:02 UTC
*** Bug 92611 has been marked as a duplicate of this bug. ***
Comment 14 GCC Commits 2020-05-19 09:26:19 UTC
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:f6e40195ec3d3b402a5f6c58dbf359479bc4cbfa

commit r11-485-gf6e40195ec3d3b402a5f6c58dbf359479bc4cbfa
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Tue May 19 11:25:46 2020 +0200

    i386: Add missing vector zero/sign extend expanders [PR92658]
    
    2020-05-19  Uroš Bizjak  <ubizjak@gmail.com>
    
    gcc/ChangeLog:
            PR target/92658
            * config/i386/sse.md (<code>v16qiv16hi2): New expander.
            (<code>v32qiv32hi2): Ditto.
            (<code>v8qiv8hi2): Ditto.
            (<code>v16qiv16si2): Ditto.
            (<code>v8qiv8si2): Ditto.
            (<code>v4qiv4si2): Ditto.
            (<code>v16hiv16si2): Ditto.
            (<code>v8hiv8si2): Ditto.
            (<code>v4hiv4si2): Ditto.
            (<code>v8qiv8di2): Ditto.
            (<code>v4qiv4di2): Ditto.
            (<code>v2qiv2di2): Ditto.
            (<code>v8hiv8di2): Ditto.
            (<code>v4hiv4di2): Ditto.
            (<code>v2hiv2di2): Ditto.
            (<code>v8siv8di2): Ditto.
            (<code>v4siv4di2): Ditto.
            (<code>v2siv2di2): Ditto.
    
    gcc/testsuite/ChangeLog:
            PR target/92658
            * gcc.target/i386/pr92658-sse4.c: New test.
            * gcc.target/i386/pr92658-avx2.c: New test.
            * gcc.target/i386/pr92658-avx512bw.c: New test.
Comment 15 Uroš Bizjak 2020-05-19 09:35:25 UTC
I will leave truncations (Down Converts in Intel speak) which are AVX512F instructions to someone else. It should be easy to add missing patterns and tests following the example of committed patch.
Comment 16 Hongtao.liu 2020-05-20 05:12:33 UTC
(In reply to Uroš Bizjak from comment #15)
> I will leave truncations (Down Converts in Intel speak) which are AVX512F
> instructions to someone else. It should be easy to add missing patterns and
> tests following the example of committed patch.

I'll take a look.
Comment 17 Hongtao.liu 2020-05-20 08:16:26 UTC
Created attachment 48570 [details]
0001-Add-missing-vector-truncmn2-expanders-PR92658.patch

Seems there're only truncmn2 for truncate, not expander for us_truncate and ss_truncate, am i missing?
Comment 18 GCC Commits 2020-05-22 13:32:32 UTC
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:e740f3d73144abbca1ad98a04825c6bd63314a0b

commit r11-571-ge740f3d73144abbca1ad98a04825c6bd63314a0b
Author: liuhongt <hongtao.liu@intel.com>
Date:   Wed May 20 15:53:14 2020 +0800

    Add missing vector truncmn2 expanders [PR92658]
    
    2020-05-22  Hongtao.liu  <hongtao.liu@intel.com>
    
    gcc/ChangeLog:
            PR target/92658
            * config/i386/sse.md (trunc<pmov_src_lower><mode>2): New expander
            (truncv32hiv32qi2): Ditto.
            (trunc<ssedoublemodelower><mode>2): Ditto.
            (trunc<mode><pmov_dst_3>2): Ditto.
            (trunc<mode><pmov_dst_mode_4>2): Ditto.
            (truncv2div2si2): Ditto.
            (truncv8div8qi2): Ditto.
            (avx512f_<code>v8div16qi2): Renaming from *avx512f_<code>v8div16qi2.
            (avx512vl_<code>v2div2si): Renaming from *avx512vl_<code>v2div2si2.
            (avx512vl_<code><mode>v2<ssecakarnum>qi2): Renaming
            from *avx512vl_<code><mode>v<ssescalarnum>qi2.
    
    gcc/testsuite/ChangeLog:
            * gcc.target/i386/pr92658-avx512f.c: New test.
            * gcc.target/i386/pr92658-avx512vl.c: Ditto.
            * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto.
Comment 19 Mark Wielaard 2020-05-22 21:50:07 UTC
(In reply to CVS Commits from comment #18)
>     gcc/testsuite/ChangeLog:
>             * gcc.target/i386/pr92658-avx512f.c: New test.
>             * gcc.target/i386/pr92658-avx512vl.c: Ditto.
>             * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto.

Note that the second one as committed has an extra closing brace which causes an error:

ERROR: gcc.target/i386/pr92658-avx512vl.c: unknown dg option: \} for "}"

diff --git a/gcc/testsuite/gcc.target/i386/pr92658-avx512vl.c b/gcc/testsuite/gcc.target/i386/pr92658-avx512vl.c
index 50b32f968ac3..dc50084119b5 100644
--- a/gcc/testsuite/gcc.target/i386/pr92658-avx512vl.c
+++ b/gcc/testsuite/gcc.target/i386/pr92658-avx512vl.c
@@ -121,7 +121,7 @@ truncdb_128 (v16qi * dst, v4si * __restrict src)
   dst[0] = *(v16qi *) tem;
 }
 
-/* { dg-final { scan-assembler-times "vpmovqd" 2 } } } */
+/* { dg-final { scan-assembler-times "vpmovqd" 2 } } */
 /* { dg-final { scan-assembler-times "vpmovqw" 2 { xfail *-*-* } } } */
 /* { dg-final { scan-assembler-times "vpmovqb" 2 { xfail *-*-* } } } */
 /* { dg-final { scan-assembler-times "vpmovdw" 1 } } */
Comment 20 Hongtao.liu 2020-05-23 02:28:17 UTC
(In reply to Mark Wielaard from comment #19)
> (In reply to CVS Commits from comment #18)
> >     gcc/testsuite/ChangeLog:
> >             * gcc.target/i386/pr92658-avx512f.c: New test.
> >             * gcc.target/i386/pr92658-avx512vl.c: Ditto.
> >             * gcc.target/i386/pr92658-avx512bw-trunc.c: Ditto.
> 
> Note that the second one as committed has an extra closing brace which
> causes an error:
> 
> ERROR: gcc.target/i386/pr92658-avx512vl.c: unknown dg option: \} for "}"
> 
> diff --git a/gcc/testsuite/gcc.target/i386/pr92658-avx512vl.c
> b/gcc/testsuite/gcc.target/i386/pr92658-avx512vl.c
> index 50b32f968ac3..dc50084119b5 100644
> --- a/gcc/testsuite/gcc.target/i386/pr92658-avx512vl.c
> +++ b/gcc/testsuite/gcc.target/i386/pr92658-avx512vl.c
> @@ -121,7 +121,7 @@ truncdb_128 (v16qi * dst, v4si * __restrict src)
>    dst[0] = *(v16qi *) tem;
>  }
>  
> -/* { dg-final { scan-assembler-times "vpmovqd" 2 } } } */
> +/* { dg-final { scan-assembler-times "vpmovqd" 2 } } */
>  /* { dg-final { scan-assembler-times "vpmovqw" 2 { xfail *-*-* } } } */
>  /* { dg-final { scan-assembler-times "vpmovqb" 2 { xfail *-*-* } } } */
>  /* { dg-final { scan-assembler-times "vpmovdw" 1 } } */

Oh, sorry for typo.
Comment 21 Hongtao.liu 2020-12-25 02:19:44 UTC
(In reply to CVS Commits from comment #14)
> The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:
> 
> https://gcc.gnu.org/g:f6e40195ec3d3b402a5f6c58dbf359479bc4cbfa
> 
> commit r11-485-gf6e40195ec3d3b402a5f6c58dbf359479bc4cbfa
> Author: Uros Bizjak <ubizjak@gmail.com>
> Date:   Tue May 19 11:25:46 2020 +0200
> 
>     i386: Add missing vector zero/sign extend expanders [PR92658]
>     
>     2020-05-19  Uroš Bizjak  <ubizjak@gmail.com>
>     
>     gcc/ChangeLog:
>             PR target/92658
>             * config/i386/sse.md (<code>v16qiv16hi2): New expander.
>             (<code>v32qiv32hi2): Ditto.
>             (<code>v8qiv8hi2): Ditto.
>             (<code>v16qiv16si2): Ditto.
>             (<code>v8qiv8si2): Ditto.
>             (<code>v4qiv4si2): Ditto.
>             (<code>v16hiv16si2): Ditto.
>             (<code>v8hiv8si2): Ditto.
>             (<code>v4hiv4si2): Ditto.
>             (<code>v8qiv8di2): Ditto.
>             (<code>v4qiv4di2): Ditto.
>             (<code>v2qiv2di2): Ditto.
>             (<code>v8hiv8di2): Ditto.
>             (<code>v4hiv4di2): Ditto.
>             (<code>v2hiv2di2): Ditto.
>             (<code>v8siv8di2): Ditto.
>             (<code>v4siv4di2): Ditto.
>             (<code>v2siv2di2): Ditto.
>     

<code> for sign_extend is sign_extend, but the standard pattern names for sign_extend is extendmn. We need to to refine those expanders

Add define_code_attr like aarch64/iterators.md?

----------
;; Map rtl objects to optab names
(define_code_attr optab [(ashift "ashl")
			 (ashiftrt "ashr")
			 (lshiftrt "lshr")
			 (rotatert "rotr")
			 (sign_extend "extend")
			 (zero_extend "zero_extend")
----------

>     gcc/testsuite/ChangeLog:
>             PR target/92658
>             * gcc.target/i386/pr92658-sse4.c: New test.
>             * gcc.target/i386/pr92658-avx2.c: New test.
>             * gcc.target/i386/pr92658-avx512bw.c: New test.
Comment 22 Uroš Bizjak 2020-12-25 10:32:15 UTC
(In reply to Hongtao.liu from comment #21)
> Add define_code_attr like aarch64/iterators.md?
> 
> ----------
> ;; Map rtl objects to optab names
> (define_code_attr optab [(ashift "ashl")
> 			 (ashiftrt "ashr")
> 			 (lshiftrt "lshr")
> 			 (rotatert "rotr")
> 			 (sign_extend "extend")
> 			 (zero_extend "zero_extend")

Yes. Please go ahead, the patch is preapproved.
Comment 23 Hongtao.liu 2020-12-29 01:36:11 UTC
(In reply to Uroš Bizjak from comment #22)
> (In reply to Hongtao.liu from comment #21)
> > Add define_code_attr like aarch64/iterators.md?
> > 
> > ----------
> > ;; Map rtl objects to optab names
> > (define_code_attr optab [(ashift "ashl")
> > 			 (ashiftrt "ashr")
> > 			 (lshiftrt "lshr")
> > 			 (rotatert "rotr")
> > 			 (sign_extend "extend")
> > 			 (zero_extend "zero_extend")
> 
> Yes. Please go ahead, the patch is preapproved.

Fixed by r11-6351
Comment 24 GCC Commits 2021-01-07 06:35:57 UTC
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:1b5669752426d225b0088d57d1d2fffba9625032

commit r11-6514-g1b5669752426d225b0088d57d1d2fffba9625032
Author: Hongyu Wang <hongyu.wang@intel.com>
Date:   Tue Dec 29 15:14:09 2020 +0800

    Adjust testcase for PR 92658
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/i386/pr92658-avx512bw.c: Add
            -mprefer-vector-width=512 to avoid impact of different default
            mtune which gcc is built with.
            * gcc.target/i386/pr92658-avx512bw-2.c: Ditto.
Comment 25 Hongtao.liu 2021-12-15 01:46:35 UTC
(In reply to Uroš Bizjak from comment #5)
> Created attachment 47927 [details]
> Prototype patch v2
> 
> A couple of typos fixed.
> 
> Still doesn't vectorize v4qi->v4si, v2qi->v2di, v2hi->v2di and v4qi->v4di.
> 
> Looks like tree-opt issue, as explained in Comment #4.

v4qi->v4si, v2hi->v2di and v4qi->v4di are vectorized, but v2qi->v2di is not, guess it's related to 16-bit vector support.
Comment 26 GCC Commits 2023-05-10 20:43:49 UTC
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:608e7f3ab47fe746279c552c3574147aa3d8ee76

commit r14-666-g608e7f3ab47fe746279c552c3574147aa3d8ee76
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Wed May 10 22:40:53 2023 +0200

    i386: Add missing vector extend patterns [PR92658]
    
    Add missing insn pattern for v2qi -> v2si vector extend and named
    expanders to activate generation of vector extends to 8-byte and 4-byte
    vectors.
    
    gcc/ChangeLog:
    
            PR target/92658
            * config/i386/mmx.md (sse4_1_<code>v2qiv2si2): New insn pattern.
            (<insn>v4qiv4hi2): New expander.
            (<insn>v2hiv2si2): Ditto.
            (<insn>v2qiv2si2): Ditto.
            (<insn>v2qiv2hi2): Ditto.
    
    gcc/testsuite/ChangeLog:
    
            PR target/92658
            * gcc.target/i386/pr92658-sse4-4b.c: New test.
            * gcc.target/i386/pr92658-sse4-8b.c: New test.
Comment 27 GCC Commits 2023-06-03 00:09:18 UTC
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:49337040865269e13cdc2ead276d12ecb2e9f606

commit r14-1509-g49337040865269e13cdc2ead276d12ecb2e9f606
Author: liuhongt <hongtao.liu@intel.com>
Date:   Thu Jun 1 15:08:02 2023 +0800

    i386: Add missing vector truncate patterns [PR92658].
    
    Add missing insn patterns for v2si -> v2hi/v2qi and v2hi-> v2qi vector
    truncate.
    
    gcc/ChangeLog:
    
            PR target/92658
            * config/i386/mmx.md (truncv2hiv2qi2): New define_insn.
            (truncv2si<mode>2): Ditto.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/i386/pr92658-avx512bw-trunc-2.c: New test.