This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH: Optimize integer vector concatenate for SSE4


On Mon, May 12, 2008 at 12:46 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> H.J. Lu wrote:
>
>
> >
> > This patch optimizes integer vector concatenate for SSE4. I
> > also renamed vector concatenate patterns to be consistent
> > with other vector patterns. OK for trunk?
> >
> >  +(define_insn "*vec_concatv2si_sse4_1"
> > +  [(set (match_operand:V2SI 0 "register_operand" "=x,x")
> > +       (vec_concat:V2SI
> > +         (match_operand:SI 1 "register_operand" "0,rm")
> >
> >
>
>  nonimmediate_operand
>
>
> > +         (match_operand:SI 2 "nonimmediate_operand" "rm,0")))]
> > +  "TARGET_SSE4_1"
> > +  "@
> > +  pinsrd\t{$0x1, %2, %0|%0, %2, 0x1}
> > +  pinsrd\t{$0x0, %2, %0|%0, %2, 0x0}"
> > +  [(set_attr "type" "sselog")
> > +   (set_attr "mode" "TI")])
> >
> >
>
>  Please check if  insn pattern with "ix86_binary_operator_ok (...)" insn
> constraint is needed to prevent combiner from combining mem/mem input
> operands.  Eventually, expander with "x86_fixup_binary_operands_no_copy
> (UNKNOWN, SImode, operands)" is needed to fix mem/mem operands expansion.
> Looking at existing vec_concat_* patterns, I think that we can trust reload
> to fix mem/mem operands for us, so IMO no fixups or extra constraints are
> needed.
>
>
> > +(define_insn "*vec_concatv2di_rex64_sse4_1"
> > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > +       (vec_concat:V2DI
> > +         (match_operand:DI 1 "register_operand" "0,rm")
> >
> >
>
>  nonimmediate_operand
>
>
> > +         (match_operand:DI 2 "nonimmediate_operand" "rm,0")))]
> > +  "TARGET_64BIT && TARGET_SSE4_1"
> > +  "@
> > +  pinsrq\t{$0x1, %2, %0|%0, %2, 0x1}
> > +  pinsrq\t{$0x0, %2, %0|%0, %2, 0x0}"
> > +  [(set_attr "type" "sselog")
> > +   (set_attr "mode" "TI")])
> >
> >
>
>  Please change operand[1] to nomimmediate_operand in both cases.
>  The patch is OK for mainline with this change.
>
>  Thanks,
>  Uros.
>

Hi Uros,

There is a bug in my patch. The second alternative isn't valid since
we can only place
the register/memory operand after the register operand with pinsrX
instruction. This
patch fixes it. I also added *vec_concatv2sf_sse4_1. Now we generate

[hjl@gnu-6 sse-1]$ cat v4sf-1.c
#include <xmmintrin.h>

extern float x2, x3;

__m128
foo1 (float x1, float x4)
{
  return _mm_set_ps (x2, x1, x3, x4);
}
[hjl@gnu-6 sse-1]$ /usr/gcc-4.4/bin/gcc -Wall -I.. -O2 -march=core2
-fno-asynchronous-unwind-tables -DDEBUG -S v4sf-1.c -msse4
[hjl@gnu-6 sse-1]$ cat v4sf-1.s
        .file   "v4sf-1.c"
        .text
        .p2align 4,,15
.globl foo1
        .type   foo1, @function
foo1:
        movss   x3(%rip), %xmm2
        unpcklps        %xmm2, %xmm1
        movaps  %xmm1, %xmm2
        movss   x2(%rip), %xmm1
        unpcklps        %xmm1, %xmm0
        movaps  %xmm2, %xmm1
        movlhps %xmm0, %xmm1
        movaps  %xmm1, %xmm0
        ret
        .size   foo1, .-foo1
        .ident  "GCC: (GNU) 4.4.0 20080509 (experimental) [trunk
revision 135128]"
        .section        .note.GNU-stack,"",@progbits
[hjl@gnu-6 sse-1]$
/export/build/gnu/gcc-stack-internal/build-x86_64-linux/gcc/xgcc -B./
-B/export/build/gnu/gcc-stack-internal/build-x86_64-linux/gcc/ -Wall
-I.. -O2 -march=core2  -fno-asynchronous-unwind-tables -DDEBUG -S
v4sf-1.c -msse4
[hjl@gnu-6 sse-1]$ cat v4sf-1.s
        .file   "v4sf-1.c"
        .text
        .p2align 4,,15
.globl foo1
        .type   foo1, @function
foo1:
        insertps        $0x10, x2(%rip), %xmm0
        insertps        $0x10, x3(%rip), %xmm1
        movaps  %xmm1, %xmm2
        movlhps %xmm0, %xmm2
        movaps  %xmm2, %xmm0
        ret
        .size   foo1, .-foo1
        .ident  "GCC: (GNU) 4.4.0 20080510 (experimental)
[stack-internal revision 2533]"
        .section        .note.GNU-stack,"",@progbits
[hjl@gnu-6 sse-1]$

OK for mainline?

Thanks.


H.J.
---
gcc/

2008-05-13  H.J. Lu  <hongjiu.lu@intel.com>

        * config/i386/sse.md (*vec_concatv2sf_sse4_1): New.
        (*vec_concatv2si_sse4_1): Remove the second alternative.
        (*vec_concatv2di_rex64_sse4_1): Likewise.

gcc/testsuite

2008-05-13  H.J. Lu  <hongjiu.lu@intel.com>

        * gcc.target/i386/sse2-set-ps-1.c: New.
        * gcc.target/i386/sse4_1-set-ps-1.c: Likewise.

Attachment: c.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]