This is the mail archive of the
mailing list for the GCC project.
Re: [BENCHMARK]-mfpmath=sse should disable x387 intrinsics
- From: Roger Sayle <roger at eyesopen dot com>
- To: Richard Guenther <richard dot guenther at gmail dot com>
- Cc: Uros Bizjak <uros at kss-loka dot si>, <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 25 Nov 2004 08:57:08 -0700 (MST)
- Subject: Re: [BENCHMARK]-mfpmath=sse should disable x387 intrinsics
On Thu, 25 Nov 2004, Richard Guenther wrote:
> On Thu, 25 Nov 2004 10:18:27 +0100, Uros Bizjak <firstname.lastname@example.org> wrote:
> > -mfpmath=sse is the worst choice in case of pentium4. The result is
> > lower by 18%, comparing to the default. That is, -mfpmath=sse,387 is
> > faster by 28%, comparing to -mfpmath=sse on pentium4.
> For me, specifying -mfpmath=sse,387 is 4% slower than -mfpmath=sse.
> I would prefer the -mfpmath=sse behavior _not_ to be changed for ia32.
Could you present the performance results for your testcase with
"-mfpmath=387", "-mfpmath=sse" and "-mfpmath=sse,387"? It's relatively
rare for "-mfpmath=sse" to be a win on a Pentium4 benchmark, and to quote
Robert Scott Ladd from his Coyote Gulch benchmarking:
>> Much to my surprise, I have yet to find any consistent evidence that
>> options like -mfpmath=sse improve program performance. Thus Acovea
>> bears out my personal experience, though it does not explain why so
>> many people continue to suggest that I should use -mfpmath=sse to
>> generate floating-point code. If someone could suggest a good
>> "-mfpmath=sse", I'd appreciate seeing it.
If your result is reproducible, there may be a latent bug in GCC that
is unable to handle the competition for resources between the SSE unit
and the FP unit. Probably not a surprise as Pentium4 doesn't even use
the DFA's scheduler. If you can reduce a small test case, I'll try
and fix it and thereby resolve your issue.
Additionally, Uros asked if you used "-D__NO_MATH_INLINES" to which
you replied "Yes, I did". To which I'd recommend that you now stop
using it if you now want x87 intrinsics but insist on turning them
off with "-mfpmath=sse".
Finally, you may find that if you want to use "-mfpmath=sse" effectively,
it may help to build a libm multi-lib (either sse-specific or soft-float)
that maximizes performance. Again, most Linux distributions don't bother
with such a specialization as -fpmath=sse is so rarely a win.
I don't think its unreasonable for you to ask for this patch to be
reverted. An even better compromise is to only use this logic on
TARGET_64BIT where its a clear advantage by default. There's also
the complication that the *BSD support in the i386.c backend makes
it difficult to enable and disable 387 intrinsics independently.
Apparently, their kernel x87 emulator doesn't handle "fancy math",
so i386.c plays games with "-mfancy-math-387", such that
"-mno-fancy-math-387" no longer works on Pentium4, and the only way
to disable x87 intrinsics on the command-line it to use the corrected
However, my guess is that you're in a small minority where taking
advantage in a bug in the intention/implementation of the "-fpmath=sse"
flag results in marginally better code. Hopefully, once a few more
opinions have been voiced we'll reach a consensus. At the moment
Uros is clearly for the patch, and you're clearly against it.
In my defense not only did I e-mail a request for further benchmarking
when I posted the patch, including e-mailing Robert Scott Ladd directly,
but I also waited 48 hours after its approval before committing it, to
ensure that all the repsonses were taken under consideration. In your
defense, you did ask about support for SSE intrinsics (and in a later
e-mail for sqrt and pow specifically). The good news is that SSE sqrt
is already supported as an SSE inline intrinsic. GCC is a volunteer
project and if someone contributes a suitable SSE pow (or other)
intrinsic pattern, I'm sure the x86 backend maintainers would be happy
to accept it.
I hope the above comments are not unreasonable?