This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH: Optimize integer vector concatenate for SSE4


On Tue, May 13, 2008 at 12:34 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Tue, May 13, 2008 at 12:25 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>  >
>  > On Tue, May 13, 2008 at 12:22 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>  >  >  Hi Uros,
>  >  >
>  >  >  There is a bug in my patch. The second alternative isn't valid since
>  >  >  we can only place
>  >  >  the register/memory operand after the register operand with pinsrX
>  >  >  instruction. This
>  >  >  patch fixes it. I also added *vec_concatv2sf_sse4_1. Now we generate
>  >  >
>  >  >  [hjl@gnu-6 sse-1]$ cat v4sf-1.c
>  >  >  #include <xmmintrin.h>
>  >  >
>  >  >  extern float x2, x3;
>  >  >
>  >  >  __m128
>  >  >  foo1 (float x1, float x4)
>  >  >  {
>  >  >   return _mm_set_ps (x2, x1, x3, x4);
>  >  >  }
>  >  >  [hjl@gnu-6 sse-1]$ /usr/gcc-4.4/bin/gcc -Wall -I.. -O2 -march=core2
>  >  >  -fno-asynchronous-unwind-tables -DDEBUG -S v4sf-1.c -msse4
>  >  >  [hjl@gnu-6 sse-1]$ cat v4sf-1.s
>  >  >         .file   "v4sf-1.c"
>  >  >         .text
>  >  >         .p2align 4,,15
>  >  >  .globl foo1
>  >  >         .type   foo1, @function
>  >  >  foo1:
>  >  >         movss   x3(%rip), %xmm2
>  >  >         unpcklps        %xmm2, %xmm1
>  >  >         movaps  %xmm1, %xmm2
>  >  >         movss   x2(%rip), %xmm1
>  >  >         unpcklps        %xmm1, %xmm0
>  >  >         movaps  %xmm2, %xmm1
>  >  >         movlhps %xmm0, %xmm1
>  >  >         movaps  %xmm1, %xmm0
>  >  >         ret
>  >  >         .size   foo1, .-foo1
>  >  >         .ident  "GCC: (GNU) 4.4.0 20080509 (experimental) [trunk
>  >  >  revision 135128]"
>  >  >         .section        .note.GNU-stack,"",@progbits
>  >  >  [hjl@gnu-6 sse-1]$
>  >  >  /export/build/gnu/gcc-stack-internal/build-x86_64-linux/gcc/xgcc -B./
>  >  >  -B/export/build/gnu/gcc-stack-internal/build-x86_64-linux/gcc/ -Wall
>  >  >  -I.. -O2 -march=core2  -fno-asynchronous-unwind-tables -DDEBUG -S
>  >  >  v4sf-1.c -msse4
>  >  >  [hjl@gnu-6 sse-1]$ cat v4sf-1.s
>  >  >         .file   "v4sf-1.c"
>  >  >         .text
>  >  >         .p2align 4,,15
>  >  >  .globl foo1
>  >  >         .type   foo1, @function
>  >  >  foo1:
>  >  >         insertps        $0x10, x2(%rip), %xmm0
>  >  >         insertps        $0x10, x3(%rip), %xmm1
>  >  >         movaps  %xmm1, %xmm2
>  >  >         movlhps %xmm0, %xmm2
>  >  >         movaps  %xmm2, %xmm0
>  >  >         ret
>  >  >         .size   foo1, .-foo1
>  >  >         .ident  "GCC: (GNU) 4.4.0 20080510 (experimental)
>  >  >  [stack-internal revision 2533]"
>  >  >         .section        .note.GNU-stack,"",@progbits
>  >  >  [hjl@gnu-6 sse-1]$
>  >  >
>  >  >  OK for mainline?
>  >  >
>  >  >  Thanks.
>  >  >
>  >  >
>  >  >  H.J.
>  >  >  ---
>  >  >  gcc/
>  >  >
>  >  >  2008-05-13  H.J. Lu  <hongjiu.lu@intel.com>
>  >  >
>  >  >         * config/i386/sse.md (*vec_concatv2sf_sse4_1): New.
>  >  >         (*vec_concatv2si_sse4_1): Remove the second alternative.
>  >  >
>  >  >         (*vec_concatv2di_rex64_sse4_1): Likewise.
>  >  >
>  >  >  gcc/testsuite
>  >  >
>  >  >  2008-05-13  H.J. Lu  <hongjiu.lu@intel.com>
>  >  >
>  >  >         * gcc.target/i386/sse2-set-ps-1.c: New.
>  >  >         * gcc.target/i386/sse4_1-set-ps-1.c: Likewise.
>  >  >
>  >
>  >  Here is a slightly modified patch. _mm_set_ps only needs SSE, not SSE2.
>  >
>  >
>  >
>  >  H.J.
>  >  ----
>  >  gcc/
>  >
>  >  2008-05-13  H.J. Lu  <hongjiu.lu@intel.com>
>  >
>  >         * config/i386/sse.md (*vec_concatv2sf_sse4_1): New.
>  >         (*vec_concatv2si_sse4_1): Remove the second alternative.
>  >         (*vec_concatv2di_rex64_sse4_1): Likewise.
>  >
>  >  gcc/testsuite
>  >
>  >  2008-05-13  H.J. Lu  <hongjiu.lu@intel.com>
>  >
>  >         * gcc.target/i386/sse-set-ps-1.c: New.
>  >
>  >
>  >         * gcc.target/i386/sse4_1-set-ps-1.c: Likewise.
>  >
>
>  Another update. Although insertps takes register source, we prefer unpcklps with
>  register source since it is shorter.  Also, I changed nonimmediate_operand
>  to register_operand on register operand.
>

Another update. I added

 (set_attr "prefix_extra" "1")

to those patterns.

H.J.

Attachment: c.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]