This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/64897] New: Floating-point "and" not optimized on x86-64
- From: "schnetter at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sun, 01 Feb 2015 19:45:59 +0000
- Subject: [Bug target/64897] New: Floating-point "and" not optimized on x86-64
- Auto-submitted: auto-generated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64897
Bug ID: 64897
Summary: Floating-point "and" not optimized on x86-64
Product: gcc
Version: 4.9.2
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: schnetter at gmail dot com
I notice that gcc does not generate "vandpd" for floating-point "and"
operations. Here is an example code that demonstrates this:
{{{
#include <math.h>
#include <string.h>
double fand1(double x)
{
unsigned long ix;
memcpy(&ix, &x, 8);
ix &= 0x7fffffffffffffffUL;
memcpy(&x, &ix, 8);
return x;
}
double fand2(double x)
{
return fabs(x);
}
}}}
When I compile this via:
{{{
gcc-mp-4.9 -O3 -march=native -S fand.c -o fand-gcc-4.9.s
}}}
(OS X, x86-64 CPU, Intel Core i7), this results in:
{{{
_fand1:
movabsq $9223372036854775807, %rax
vmovd %xmm0, %rdx
andq %rdx, %rax
vmovd %rax, %xmm0
ret
_fand2:
vmovsd LC1(%rip), %xmm1
vandpd %xmm1, %xmm0, %xmm0
ret
}}}
This shows that (a) gcc performs the bitwise and operation in an integer
register, which is probably slower, and (b) the implementors of "fabs" assume
that using the "vandpd" instruction is faster.