This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/64897] New: Floating-point "and" not optimized on x86-64


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64897

            Bug ID: 64897
           Summary: Floating-point "and" not optimized on x86-64
           Product: gcc
           Version: 4.9.2
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: schnetter at gmail dot com

I notice that gcc does not generate "vandpd" for floating-point "and"
operations. Here is an example code that demonstrates this:
{{{
#include <math.h>
#include <string.h>
double fand1(double x)
{
  unsigned long ix;
  memcpy(&ix, &x, 8);
  ix &= 0x7fffffffffffffffUL;
  memcpy(&x, &ix, 8);
  return x;
}
double fand2(double x)
{
  return fabs(x);
}
}}}

When I compile this via:
{{{
gcc-mp-4.9 -O3 -march=native -S fand.c -o fand-gcc-4.9.s
}}}
(OS X, x86-64 CPU, Intel Core i7), this results in:
{{{
_fand1:
    movabsq    $9223372036854775807, %rax
    vmovd    %xmm0, %rdx
    andq    %rdx, %rax
    vmovd    %rax, %xmm0
    ret
_fand2:
    vmovsd    LC1(%rip), %xmm1
    vandpd    %xmm1, %xmm0, %xmm0
    ret
}}}

This shows that (a) gcc performs the bitwise and operation in an integer
register, which is probably slower, and (b) the implementors of "fabs" assume
that using the "vandpd" instruction is faster.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]