This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: GCC 4.0, Fast Math, and Acovea
- From: Scott Robert Ladd <scott dot ladd at coyotegulch dot com>
- To: Uros Bizjak <uros at kss-loka dot si>
- Cc: gcc at gcc dot gnu dot org
- Date: Sat, 30 Apr 2005 07:21:11 -0400
- Subject: Re: GCC 4.0, Fast Math, and Acovea
- References: <42728C99.6020906@kss-loka.si>
Uros Bizjak wrote:
Hello Scott!
Specifically, the -funsafe-math-optimizations flag doesn't work
correctly on AMD64 because the default on that platform is
-mfpmath=sse. Without specifying -mfpmath=387,
-funsafe-math-optimizations does not generate inline processor
instructions for most floating-point functions.
Let's put it another way: Manually selecting -mfpmath=387 cuts
run-times by 50% for programs dependent on functions like sin() and
sqrt(), as compared to -funsafe-math-optimizations by itself.
It was found that moving data from SSE registers to X87 registers (and
back) only to call an x87 builtin degrades performance. Because of
this, x87 builtins are disabled for -mfpmath=sse and a normal libcall
is issued for sin(), etc functions. If someone wants to use x87
builtins, then _all_ math operations should be done in x87 registers
to avoid costly SSE->x87 moves.
BTW: Does adding -D__NO_MATH_INLINES improve performance for
-mfpmath=sse? That would be PR19602.
Uros.
Well, on every function-intensive (i.e., using lots of sqrt(), sin(),
and such) program I've tried, using -funsafe-math-optimizations provides
not significant benefit on the Opteron *unless* it is combined with
-mfpmath=387.
I note that Intel and other compilers do not seem to have this problem.
Now, I'm more than happy to live with the situation, since it has a
simple work-around -- but I think it at least needs to be made clear in
the GCC documentation that this situation exists. Otherwise, GCC 4.0
looks *terribly* for many mathematical tasks on AMD64. And right now,
AMD64 is a hot property in the mathematical circles, especially in
clustered supercomputing.
..Scott