This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Improve *avx_vperm_broadcast_*
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: Uros Bizjak <ubizjak at gmail dot com>, Kirill Yukhin <kirill dot yukhin at gmail dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 31 May 2016 06:54:14 -0700
- Subject: Re: [PATCH] Improve *avx_vperm_broadcast_*
- Authentication-results: sourceware.org; auth=none
- References: <20160523171507 dot GW28550 at tucnak dot redhat dot com>
On Mon, May 23, 2016 at 10:15 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> Hi!
>
> The vbroadcastss and vpermilps insns are already in AVX512F & AVX512VL,
> so can be used with v instead of x, the splitter case where we for AVX
> emit vpermilps plus vpermf128 is more problematic, because the latter
> insn isn't available in EVEX. But, we can get the same effect with
> vshuff32x4 when both source operands are the same.
> Alternatively, we could replace the vpermilps and vshuff32x4 insns
> with the AVX512VL arbitrary permutations I think, the question is
> what is faster, because we'd need to load the mask from memory.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-05-23 Jakub Jelinek <jakub@redhat.com>
>
> * config/i386/sse.md
> (<mask_codefor>avx512vl_shuf_<shuffletype>32x4_1<mask_name>): Rename
> to ...
> (avx512vl_shuf_<shuffletype>32x4_1<mask_name>): ... this.
> (*avx_vperm_broadcast_v4sf): Use v constraint instead of x. Use
> maybe_evex prefix instead of vex.
> (*avx_vperm_broadcast_<mode>): Use v constraint instead of x. Handle
> EXT_REX_SSE_REG_P (op0) case in the splitter.
>
> * gcc.target/i386/avx512vl-vbroadcast-3.c: New test.
>
The new test fails on x32 due to 32-bit register in address. This
patch fixes it. Tested on x86-64. OK for trunk?
Thanks.
H.J.
----
2016-05-31 H.J. Lu <hongjiu.lu@intel.com>
* gcc.target/i386/avx512vl-vbroadcast-3.c: Scan %\[re\]di
instead of %rdi.
* gcc.target/i386/avx512vl-vcvtps2ph-3.c: Likewise.
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c
b/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c
index d981fe4..7233398 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c
@@ -150,9 +150,9 @@ f16 (V2 *x)
asm volatile ("" : "+v" (a));
}
-/* { dg-final { scan-assembler-times
"vbroadcastss\[^\n\r]*%rdi\[^\n\r]*%xmm16" 4 } } */
+/* { dg-final { scan-assembler-times
"vbroadcastss\[^\n\r]*%\[re\]di\[^\n\r]*%xmm16" 4 } } */
/* { dg-final { scan-assembler-times
"vbroadcastss\[^\n\r]*%xmm16\[^\n\r]*%ymm16" 3 } } */
-/* { dg-final { scan-assembler-times
"vbroadcastss\[^\n\r]*%rdi\[^\n\r]*%ymm16" 3 } } */
+/* { dg-final { scan-assembler-times
"vbroadcastss\[^\n\r]*%\[re\]di\[^\n\r]*%ymm16" 3 } } */
/* { dg-final { scan-assembler-times
"vpermilps\[^\n\r]*\\\$0\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */
/* { dg-final { scan-assembler-times
"vpermilps\[^\n\r]*\\\$85\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */
/* { dg-final { scan-assembler-times
"vpermilps\[^\n\r]*\\\$170\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-3.c
b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-3.c
index 2fd2215..c2e3f01 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-3.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-3.c
@@ -38,4 +38,4 @@ f3 (__m256 x, __v8hi *y)
*y = (__v8hi) _mm256_cvtps_ph (a, 1);
}
-/* { dg-final { scan-assembler
"vcvtps2ph\[^\n\r]*\\\$1\[^\n\r]*%ymm16\[^\n\r]*%rdi" } } */
+/* { dg-final { scan-assembler
"vcvtps2ph\[^\n\r]*\\\$1\[^\n\r]*%ymm16\[^\n\r]*%\[re\]di" } } */