This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
PATCH: PR target/37169: [4.4 Regression] Inefficent code for _mm_cvtsi64_si128
- From: "H.J. Lu" <hongjiu dot lu at intel dot com>
- To: gcc-patches at gcc dot gnu dot org, ubizjak at gmail dot com
- Cc: Joey Ye <joey dot ye at intel dot com>, Xuepeng Guo <xuepeng dot guo at intel dot com>
- Date: Tue, 19 Aug 2008 18:35:14 -0700
- Subject: PATCH: PR target/37169: [4.4 Regression] Inefficent code for _mm_cvtsi64_si128
- Reply-to: "H.J. Lu" <hjl dot tools at gmail dot com>
Hi,
For V2DI vector concat, the first element isn't zero, the second
element is 0 and inter-unit moves are OK, movq is faster. For
__m128i
test (long long b)
{
return _mm_cvtsi64_si128 (b);
}
this patch generates
movq %rdi, %xmm0
instead of
pxor %xmm0, %xmm0
pinsrq $0, %rdi, %xmm0
with -O2 -msse4 -march=core2. OK for trunk?
Thanks.
H.J.
----
gcc/
2008-08-19 H.J. Lu <hongjiu.lu@intel.com>
PR target/37169
* config/i386/i386.c (ix86_expand_vector_init_one_nonzero): Use
movq instead of vector set if the first element isn't zero and
inter-unit moves are OK.
gcc/testsuite/
2008-08-19 H.J. Lu <hongjiu.lu@intel.com>
PR target/37169
* i386/sse2-init-v2di-2.c: New.
--- gcc/config/i386/i386.c.movq 2008-08-11 08:40:14.000000000 -0700
+++ gcc/config/i386/i386.c 2008-08-19 18:26:53.000000000 -0700
@@ -25104,7 +25104,12 @@ ix86_expand_vector_init_one_nonzero (boo
switch (mode)
{
case V2DImode:
- use_vector_set = TARGET_64BIT && TARGET_SSE4_1;
+ /* If the first element isn't zero and inter-unit moves are OK,
+ we use movq instead of vector set. */
+ use_vector_set = (TARGET_64BIT
+ && TARGET_SSE4_1
+ && !(TARGET_INTER_UNIT_MOVES
+ && one_var == 0));
break;
case V16QImode:
case V4SImode:
--- gcc/testsuite/gcc.target/i386/sse2-init-v2di-2.c.movq 2008-08-19 18:28:58.000000000 -0700
+++ gcc/testsuite/gcc.target/i386/sse2-init-v2di-2.c 2008-08-19 18:27:24.000000000 -0700
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -msse4 -march=core2" } */
+
+#include <emmintrin.h>
+
+__m128i
+test (long long b)
+{
+ return _mm_cvtsi64_si128 (b);
+}
+
+/* { dg-final { scan-assembler "movq" } } */