This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH: Optimize integer vector concatenate for SSE4


On Tue, May 13, 2008 at 12:22 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>  Hi Uros,
>
>  There is a bug in my patch. The second alternative isn't valid since
>  we can only place
>  the register/memory operand after the register operand with pinsrX
>  instruction. This
>  patch fixes it. I also added *vec_concatv2sf_sse4_1. Now we generate
>
>  [hjl@gnu-6 sse-1]$ cat v4sf-1.c
>  #include <xmmintrin.h>
>
>  extern float x2, x3;
>
>  __m128
>  foo1 (float x1, float x4)
>  {
>   return _mm_set_ps (x2, x1, x3, x4);
>  }
>  [hjl@gnu-6 sse-1]$ /usr/gcc-4.4/bin/gcc -Wall -I.. -O2 -march=core2
>  -fno-asynchronous-unwind-tables -DDEBUG -S v4sf-1.c -msse4
>  [hjl@gnu-6 sse-1]$ cat v4sf-1.s
>         .file   "v4sf-1.c"
>         .text
>         .p2align 4,,15
>  .globl foo1
>         .type   foo1, @function
>  foo1:
>         movss   x3(%rip), %xmm2
>         unpcklps        %xmm2, %xmm1
>         movaps  %xmm1, %xmm2
>         movss   x2(%rip), %xmm1
>         unpcklps        %xmm1, %xmm0
>         movaps  %xmm2, %xmm1
>         movlhps %xmm0, %xmm1
>         movaps  %xmm1, %xmm0
>         ret
>         .size   foo1, .-foo1
>         .ident  "GCC: (GNU) 4.4.0 20080509 (experimental) [trunk
>  revision 135128]"
>         .section        .note.GNU-stack,"",@progbits
>  [hjl@gnu-6 sse-1]$
>  /export/build/gnu/gcc-stack-internal/build-x86_64-linux/gcc/xgcc -B./
>  -B/export/build/gnu/gcc-stack-internal/build-x86_64-linux/gcc/ -Wall
>  -I.. -O2 -march=core2  -fno-asynchronous-unwind-tables -DDEBUG -S
>  v4sf-1.c -msse4
>  [hjl@gnu-6 sse-1]$ cat v4sf-1.s
>         .file   "v4sf-1.c"
>         .text
>         .p2align 4,,15
>  .globl foo1
>         .type   foo1, @function
>  foo1:
>         insertps        $0x10, x2(%rip), %xmm0
>         insertps        $0x10, x3(%rip), %xmm1
>         movaps  %xmm1, %xmm2
>         movlhps %xmm0, %xmm2
>         movaps  %xmm2, %xmm0
>         ret
>         .size   foo1, .-foo1
>         .ident  "GCC: (GNU) 4.4.0 20080510 (experimental)
>  [stack-internal revision 2533]"
>         .section        .note.GNU-stack,"",@progbits
>  [hjl@gnu-6 sse-1]$
>
>  OK for mainline?
>
>  Thanks.
>
>
>  H.J.
>  ---
>  gcc/
>
>  2008-05-13  H.J. Lu  <hongjiu.lu@intel.com>
>
>         * config/i386/sse.md (*vec_concatv2sf_sse4_1): New.
>         (*vec_concatv2si_sse4_1): Remove the second alternative.
>
>         (*vec_concatv2di_rex64_sse4_1): Likewise.
>
>  gcc/testsuite
>
>  2008-05-13  H.J. Lu  <hongjiu.lu@intel.com>
>
>         * gcc.target/i386/sse2-set-ps-1.c: New.
>         * gcc.target/i386/sse4_1-set-ps-1.c: Likewise.
>

Here is a slightly modified patch. _mm_set_ps only needs SSE, not SSE2.


H.J.
----
gcc/

2008-05-13  H.J. Lu  <hongjiu.lu@intel.com>

        * config/i386/sse.md (*vec_concatv2sf_sse4_1): New.
        (*vec_concatv2si_sse4_1): Remove the second alternative.
        (*vec_concatv2di_rex64_sse4_1): Likewise.

gcc/testsuite

2008-05-13  H.J. Lu  <hongjiu.lu@intel.com>

        * gcc.target/i386/sse-set-ps-1.c: New.
        * gcc.target/i386/sse4_1-set-ps-1.c: Likewise.

Attachment: c.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]