This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Better _MM_TRANSPOSE4_PS

From: Evan Cheng <evan dot cheng at apple dot com>
To: gcc-patches at gcc dot gnu dot org
Date: Thu, 6 Oct 2005 15:45:10 -0700
Subject: Better _MM_TRANSPOSE4_PS

Hi,

We would like to contribute a faster _MM_TRANSPOSE4_PS macro in config/i386/xmmintrin.h

This version uses high / low moves and unpacks. It's 16% faster than the old version on current generation of Pentium 4 processors.

Thanks,

Evan Cheng
Apple Computers, Inc.

Index: config/i386/xmmintrin.h =================================================================== RCS file: /cvs/gcc/gcc/gcc/config/i386/xmmintrin.h,v retrieving revision 1.33.6.3 diff -r1.33.6.3 xmmintrin.h 1200,1201c1200,1201 < #define _MM_TRANSPOSE4_PS(row0, row1, row2, row3) \ < do { \ --- > #define _MM_TRANSPOSE4_PS(row0, row1, row2, row3) \ > do { \ 1203,1210c1203,1210 < __v4sf __t0 = __builtin_ia32_shufps (__r0, __r1, 0x44); \ < __v4sf __t2 = __builtin_ia32_shufps (__r0, __r1, 0xEE); \ < __v4sf __t1 = __builtin_ia32_shufps (__r2, __r3, 0x44); \ < __v4sf __t3 = __builtin_ia32_shufps (__r2, __r3, 0xEE); \ < (row0) = __builtin_ia32_shufps (__t0, __t1, 0x88); \ < (row1) = __builtin_ia32_shufps (__t0, __t1, 0xDD); \ < (row2) = __builtin_ia32_shufps (__t2, __t3, 0x88); \ < (row3) = __builtin_ia32_shufps (__t2, __t3, 0xDD); \ --- > __v4sf __t0 = __builtin_ia32_unpcklps (__r0, __r1); \ > __v4sf __t1 = __builtin_ia32_unpcklps (__r2, __r3); \ > __v4sf __t2 = __builtin_ia32_unpckhps (__r0, __r1); \ > __v4sf __t3 = __builtin_ia32_unpckhps (__r2, __r3); \ > (row0) = __builtin_ia32_movlhps (__t0, __t1); \ > (row1) = __builtin_ia32_movhlps (__t1, __t0); \ > (row2) = __builtin_ia32_movlhps (__t2, __t3); \ > (row3) = __builtin_ia32_movhlps (__t3, __t2); \ 1212a1213 >

Follow-Ups:
- Re: Better _MM_TRANSPOSE4_PS
  - From: Andrew Pinski
- Re: Better _MM_TRANSPOSE4_PS
  - From: Eric Christopher
- Re: Better _MM_TRANSPOSE4_PS
  - From: Richard Henderson
- Re: Better _MM_TRANSPOSE4_PS
  - From: Richard Guenther

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]