PATCH: Optimize integer vector concatenate for SSE4
H.J. Lu
hjl.tools@gmail.com
Tue May 13 19:25:00 GMT 2008
On Mon, May 12, 2008 at 12:46 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> H.J. Lu wrote:
>
>
> >
> > This patch optimizes integer vector concatenate for SSE4. I
> > also renamed vector concatenate patterns to be consistent
> > with other vector patterns. OK for trunk?
> >
> > +(define_insn "*vec_concatv2si_sse4_1"
> > + [(set (match_operand:V2SI 0 "register_operand" "=x,x")
> > + (vec_concat:V2SI
> > + (match_operand:SI 1 "register_operand" "0,rm")
> >
> >
>
> nonimmediate_operand
>
>
> > + (match_operand:SI 2 "nonimmediate_operand" "rm,0")))]
> > + "TARGET_SSE4_1"
> > + "@
> > + pinsrd\t{$0x1, %2, %0|%0, %2, 0x1}
> > + pinsrd\t{$0x0, %2, %0|%0, %2, 0x0}"
> > + [(set_attr "type" "sselog")
> > + (set_attr "mode" "TI")])
> >
> >
>
> Please check if insn pattern with "ix86_binary_operator_ok (...)" insn
> constraint is needed to prevent combiner from combining mem/mem input
> operands. Eventually, expander with "x86_fixup_binary_operands_no_copy
> (UNKNOWN, SImode, operands)" is needed to fix mem/mem operands expansion.
> Looking at existing vec_concat_* patterns, I think that we can trust reload
> to fix mem/mem operands for us, so IMO no fixups or extra constraints are
> needed.
>
>
> > +(define_insn "*vec_concatv2di_rex64_sse4_1"
> > + [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > + (vec_concat:V2DI
> > + (match_operand:DI 1 "register_operand" "0,rm")
> >
> >
>
> nonimmediate_operand
>
>
> > + (match_operand:DI 2 "nonimmediate_operand" "rm,0")))]
> > + "TARGET_64BIT && TARGET_SSE4_1"
> > + "@
> > + pinsrq\t{$0x1, %2, %0|%0, %2, 0x1}
> > + pinsrq\t{$0x0, %2, %0|%0, %2, 0x0}"
> > + [(set_attr "type" "sselog")
> > + (set_attr "mode" "TI")])
> >
> >
>
> Please change operand[1] to nomimmediate_operand in both cases.
> The patch is OK for mainline with this change.
>
> Thanks,
> Uros.
>
Hi Uros,
There is a bug in my patch. The second alternative isn't valid since
we can only place
the register/memory operand after the register operand with pinsrX
instruction. This
patch fixes it. I also added *vec_concatv2sf_sse4_1. Now we generate
[hjl@gnu-6 sse-1]$ cat v4sf-1.c
#include <xmmintrin.h>
extern float x2, x3;
__m128
foo1 (float x1, float x4)
{
return _mm_set_ps (x2, x1, x3, x4);
}
[hjl@gnu-6 sse-1]$ /usr/gcc-4.4/bin/gcc -Wall -I.. -O2 -march=core2
-fno-asynchronous-unwind-tables -DDEBUG -S v4sf-1.c -msse4
[hjl@gnu-6 sse-1]$ cat v4sf-1.s
.file "v4sf-1.c"
.text
.p2align 4,,15
.globl foo1
.type foo1, @function
foo1:
movss x3(%rip), %xmm2
unpcklps %xmm2, %xmm1
movaps %xmm1, %xmm2
movss x2(%rip), %xmm1
unpcklps %xmm1, %xmm0
movaps %xmm2, %xmm1
movlhps %xmm0, %xmm1
movaps %xmm1, %xmm0
ret
.size foo1, .-foo1
.ident "GCC: (GNU) 4.4.0 20080509 (experimental) [trunk
revision 135128]"
.section .note.GNU-stack,"",@progbits
[hjl@gnu-6 sse-1]$
/export/build/gnu/gcc-stack-internal/build-x86_64-linux/gcc/xgcc -B./
-B/export/build/gnu/gcc-stack-internal/build-x86_64-linux/gcc/ -Wall
-I.. -O2 -march=core2 -fno-asynchronous-unwind-tables -DDEBUG -S
v4sf-1.c -msse4
[hjl@gnu-6 sse-1]$ cat v4sf-1.s
.file "v4sf-1.c"
.text
.p2align 4,,15
.globl foo1
.type foo1, @function
foo1:
insertps $0x10, x2(%rip), %xmm0
insertps $0x10, x3(%rip), %xmm1
movaps %xmm1, %xmm2
movlhps %xmm0, %xmm2
movaps %xmm2, %xmm0
ret
.size foo1, .-foo1
.ident "GCC: (GNU) 4.4.0 20080510 (experimental)
[stack-internal revision 2533]"
.section .note.GNU-stack,"",@progbits
[hjl@gnu-6 sse-1]$
OK for mainline?
Thanks.
H.J.
---
gcc/
2008-05-13 H.J. Lu <hongjiu.lu@intel.com>
* config/i386/sse.md (*vec_concatv2sf_sse4_1): New.
(*vec_concatv2si_sse4_1): Remove the second alternative.
(*vec_concatv2di_rex64_sse4_1): Likewise.
gcc/testsuite
2008-05-13 H.J. Lu <hongjiu.lu@intel.com>
* gcc.target/i386/sse2-set-ps-1.c: New.
* gcc.target/i386/sse4_1-set-ps-1.c: Likewise.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: c.txt
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20080513/b4846151/attachment.txt>
More information about the Gcc-patches
mailing list