This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Mapping NAN to ZERO / When does gcc generate MOVcc and FCMOVcc instructions?

From: "Uros Bizjak" <ubizjak at gmail dot com>
To: "GCC Development" <gcc at gcc dot gnu dot org>
Cc: "Michael James" <james dot me at gmail dot com>
Date: Fri, 3 Nov 2006 08:45:42 +0100
Subject: Re: Mapping NAN to ZERO / When does gcc generate MOVcc and FCMOVcc instructions?

Michael James wrote:

Conceptually, the code is:

double sum = 0;

for(i=0; i<n; ++i) {
   float x = ..computed..;
   sum += isnan(x)? 0: x;
}

I have tried a half dozen variants at the source level in attempt to
get gcc to do this without branching (and without calling a helper
function isnan). I was not really able to succeed at either of these.


You need to specify an architecture that has cmov instruction; at
least -march=i686.

Concerning the inline evaluation of isnan, I tried using
__builtin_unordered(x,x) which either gets optimized out of existence
when I specificy -funsafe-math-optimizations, or causes other gcc math
inlines (specifically log) to not use their inline definitions when I
do not specificy -funsafe-math-optimizations. For my particular
problem I have a work around for this which none-the-less causes the
result of isnan to end up as a condition flag in the EFLAGS register.
(Instead of a test for nan, I use a test for 0 in the domain of the
log.)

This testcase (similar to yours, but it actually compiles):

double test(int n, double a)
{
 double sum = 0.0;
 int i;

 for(i=0; i<n; ++i)
   {
     float x = logf((float)i);
     sum += isnan(x) ? 0 : x;
   }

 return sum;
}

produces exactly the code you are looking for (using gcc-4.2 with -march=i686):

.L5:
       pushl   %ebx
       fildl   (%esp)
       addl    $4, %esp
       fstps   (%esp)
       fstpl   -24(%ebp)
       call    logf
       fucomi  %st(0), %st
       fldz
       fcmovnu %st(1), %st
       fstp    %st(1)
       addl    $1, %ebx
       cmpl    %esi, %ebx
       fldl    -24(%ebp)
       faddp   %st, %st(1)
       jne     .L5

logf() function will be inlined by specifying
-funsafe-math-optimizations, this flag also enables implicit
float->double extensions for x87 math. As you probably don't need math
errno from log(), -fno-math-errno should be added.

Those two flags produce IMO optimal loop:

.L5:
       pushl   %eax
       fildl   (%esp)
       addl    $4, %esp
       fldln2
       fxch    %st(1)
       fyl2x
       fucomi  %st(0), %st
       fldz
       fcmovnu %st(1), %st
       fstp    %st(1)
       addl    $1, %eax
       cmpl    %edx, %eax
       faddp   %st, %st(1)
       jne     .L5

Uros.

Concerning the use of an unconditional add, followed by a FCMOVcc
instead of a Jcc, I have had no success: I have tried code such as:

Follow-Ups:
- Re: Mapping NAN to ZERO / When does gcc generate MOVcc and FCMOVcc instructions?
  - From: Michael James

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]