This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/60826] New: inefficient code for vector xor on SSE2
- From: "sunfish at mozilla dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 11 Apr 2014 18:08:59 +0000
- Subject: [Bug target/60826] New: inefficient code for vector xor on SSE2
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60826
Bug ID: 60826
Summary: inefficient code for vector xor on SSE2
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: sunfish at mozilla dot com
On the following C testcase:
#include <stdint.h>
typedef double v2f64 __attribute__((__vector_size__(16), may_alias));
typedef int64_t v2i64 __attribute__((__vector_size__(16), may_alias));
static inline v2f64 f_and (v2f64 l, v2f64 r) { return (v2f64)((v2i64)l &
(v2i64)r); }
static inline v2f64 f_xor (v2f64 l, v2f64 r) { return (v2f64)((v2i64)l ^
(v2i64)r); }
static inline double vector_to_scalar(v2f64 v) { return v[0]; }
double test(v2f64 w, v2f64 x, v2f64 z)
{
v2f64 y = f_and(w, x);
return vector_to_scalar(f_xor(z, y));
}
GCC emits this code:
andpd %xmm1, %xmm0
movdqa %xmm0, %xmm3
pxor %xmm2, %xmm3
movdqa %xmm3, -24(%rsp)
movsd -24(%rsp), %xmm0
ret
GCC should move the result of the xor to the return register directly instead
of spilling it. Also, it should avoid the first movdqa, which is an unnecessary
copy.
Also, this should ideally use xorpd instead of pxor, to avoid a domain-crossing
penalty on Nehalem and other micro-architectures (or xorps if domain-crossing
doesn't matter, since its smaller).