This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/41084] New: Filling xmm register with all bit set is not optimized
- From: "etjq78kl at free dot fr" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 16 Aug 2009 12:34:08 -0000
- Subject: [Bug target/41084] New: Filling xmm register with all bit set is not optimized
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
Hi,
A way to fill an xmm register with all one, is to use _mm_cmpeq_epi{8,16,32}
wiht itself.
However, if you write:
__m128i r; r = _mm_cmpeq_epi32 (r, r);
gcc absolutely wants to clear the register before and generates (this is output
of objdump -d, compiled with -O3 -march=core2):
401484: 66 0f ef c0 pxor %xmm0,%xmm0
401488: 66 0f 74 c0 pcmpeqw %xmm0,%xmm0
It does not discover that the result is independant of the initial value of r,
and wants to clear it before.
Similarly, if one writes (code adapted from _mm_setzero_si128 (void) in
emmintrin.h):
__m128i r = __extension__ (__m128i)(__v4si){ 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff };
then this will generate a memory load operation, instead of the optimized
pcmpeqw instruction.
I would expect both
__m128i r; r = _mm_cmpeq_epi32 (r, r);
and
__m128i r = __extension__ (__m128i)(__v4si){ 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff };
generate the same unique instruction: pcmpeqw %xmm0, %xmm0
exactly as:
__m128i r; r = _mm_xor_si128 (r, r);
and
__m128i r = __extension__ (__m128i)(__v4si){ 0, 0, 0, 0 };
outputs pxor %xmm0, %xmm0 in both cases.
Best regards.
Antoine
--
Summary: Filling xmm register with all bit set is not optimized
Product: gcc
Version: 4.4.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: etjq78kl at free dot fr
GCC target triplet: i?86
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41084