This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Mapping NAN to ZERO / When does gcc generate MOVcc and FCMOVcc instructions?
- From: "Uros Bizjak" <ubizjak at gmail dot com>
- To: "GCC Development" <gcc at gcc dot gnu dot org>
- Cc: "Michael James" <james dot me at gmail dot com>
- Date: Fri, 3 Nov 2006 08:45:42 +0100
- Subject: Re: Mapping NAN to ZERO / When does gcc generate MOVcc and FCMOVcc instructions?
Michael James wrote:
Conceptually, the code is:
double sum = 0;
for(i=0; i<n; ++i) {
float x = ..computed..;
sum += isnan(x)? 0: x;
}
I have tried a half dozen variants at the source level in attempt to
get gcc to do this without branching (and without calling a helper
function isnan). I was not really able to succeed at either of these.
You need to specify an architecture that has cmov instruction; at
least -march=i686.
Concerning the inline evaluation of isnan, I tried using
__builtin_unordered(x,x) which either gets optimized out of existence
when I specificy -funsafe-math-optimizations, or causes other gcc math
inlines (specifically log) to not use their inline definitions when I
do not specificy -funsafe-math-optimizations. For my particular
problem I have a work around for this which none-the-less causes the
result of isnan to end up as a condition flag in the EFLAGS register.
(Instead of a test for nan, I use a test for 0 in the domain of the
log.)
This testcase (similar to yours, but it actually compiles):
double test(int n, double a)
{
double sum = 0.0;
int i;
for(i=0; i<n; ++i)
{
float x = logf((float)i);
sum += isnan(x) ? 0 : x;
}
return sum;
}
produces exactly the code you are looking for (using gcc-4.2 with -march=i686):
.L5:
pushl %ebx
fildl (%esp)
addl $4, %esp
fstps (%esp)
fstpl -24(%ebp)
call logf
fucomi %st(0), %st
fldz
fcmovnu %st(1), %st
fstp %st(1)
addl $1, %ebx
cmpl %esi, %ebx
fldl -24(%ebp)
faddp %st, %st(1)
jne .L5
logf() function will be inlined by specifying
-funsafe-math-optimizations, this flag also enables implicit
float->double extensions for x87 math. As you probably don't need math
errno from log(), -fno-math-errno should be added.
Those two flags produce IMO optimal loop:
.L5:
pushl %eax
fildl (%esp)
addl $4, %esp
fldln2
fxch %st(1)
fyl2x
fucomi %st(0), %st
fldz
fcmovnu %st(1), %st
fstp %st(1)
addl $1, %eax
cmpl %edx, %eax
faddp %st, %st(1)
jne .L5
Uros.
Concerning the use of an unconditional add, followed by a FCMOVcc
instead of a Jcc, I have had no success: I have tried code such as: