This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: GCC 4.0, Fast Math, and Acovea


On 4/29/05, Uros Bizjak <uros@kss-loka.si> wrote:
> Hello Scott!
Hello Scott & Uros,
 
> > Specifically, the -funsafe-math-optimizations flag doesn't work
> > correctly on AMD64 because the default on that platform is
> > -mfpmath=sse. Without specifying -mfpmath=387,
> > -funsafe-math-optimizations does not generate inline processor
> > instructions for most floating-point functions.
[snip]
> It was found that moving data from SSE registers to X87 registers (and
> back) only to call an x87 builtin degrades performance. Because of this,
> x87 builtins are disabled for -mfpmath=sse and a normal libcall is
> issued for sin(), etc functions. If someone wants to use x87 builtins,
> then _all_ math operations should be done in x87 registers to avoid
> costly SSE->x87 moves.

Shameless plug with my own performance analysis regarding SSE on x86-64.
I've ported my coherent raytracer which mostly uses intrinsics in the
hot path (and no transcendentals).
While gcc4.x compiled binaries are ~5% slower than those compiled with
icc8.1 on ia32 (best case), it's the other way around on x86-64 if not
more (on my opteron with icc8.1 and beta 9.0).
Obviously there's much less pressure on the (cough weak cough)
register allocator and in the end the generated code is way leaner.

My only gripe with fast-math is that it's the only way to enable some
optimizations while making NaNs verbotten; couple that with the lack
of cross unit IPO and you're stuck with a kind of nasty "global"
switch (unless you have room for some function calls).


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]