[Bug rtl-optimization/43147] New: SSE shuffle merge
liranuna at gmail dot com
gcc-bugzilla@gcc.gnu.org
Tue Feb 23 01:27:00 GMT 2010
I've noticed that GCC (my current version is 4.4.1) doesn't fully optimize SSE
shuffle merges, as seen in this example:
#include <xmmintrin.h>
extern void printv(__m128 m);
int main()
{
m = _mm_shuffle_ps(m, m, 0xC9); // Those two shuffles together swap
pairs
m = _mm_shuffle_ps(m, m, 0x2D); // And could be optimized to 0x4E
printv(m);
return 0;
}
This code generates the following assembly:
movaps .LC1, %xmm1
shufps $201, %xmm1, %xmm1
shufps $45, %xmm1, %xmm1 ; <-- Both should merge to 78
movaps %xmm1, %xmm0
movaps %xmm1, -24(%ebp)
.LC0:
.long 1065353216 ; 1.0f
.long 1073741824 ; 2.0f
.long 1077936128 ; 3.0f
.long 1082130432 ; 4.0f
Would be nice to see it as an enhancement!
--
Summary: SSE shuffle merge
Product: gcc
Version: 4.4.1
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: liranuna at gmail dot com
GCC build triplet: x86_64-linux-gnu
GCC host triplet: x86_64-linux-gnu
GCC target triplet: x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147
More information about the Gcc-bugs
mailing list