This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC PATCH, x86_64] Use -mno-sse[,2] to fall back to x87 FP argument passing convention


On Tue, 10 Oct 2006, Uros Bizjak wrote:
> > Has anyone recently revisited the -mfpmath=* issue with SPEC or acovea?
> > I'm sure Uros can detail the current situation with POV-ray.
>
> I have results for povray-3.6.1 on "Intel(R) Xeon(TM) CPU 3.60GHz",
> 32bit code:

To respond to the issue of apples vs. oranges comparisons, I'll post
the whetstone figures for a recent mainline GCC on current 64-bit CPUs.

On a Core2 6600 @ 2.4GHz
-O3 -ffast-math -mfpmath=387		2799 MWIPS
-O3 -ffast-math -mfpmath=387,sse	2702 MWIPS
-O3 -ffast-math                         2501 MWIPS

On an Opteron 248 @ 2.21GHz
-O3 -ffast-math -mfpmath=387		2578 MWIPS
-O3 -ffast-math                         2213 MWIPS

This shows that x87 math is 12-16% faster on 64-bit code on this
particular benchmark.  When the whetstone benchmark was designed
in 1972, it was intended to reflect the fractions of floating
point, integer, and branch operations that were measured empirically
in the programs of the time.  Unfortunately, over thirty years
later, modern hardware runs the integer codes 10s of 1000s of
times faster, but the FP code is only 1000s of times faster.  Hence
FP performance is now a very significant aspect, whereas integer
optimizations are less critical, due to this biasing.

Confirmation that the -mfpmath value makes a difference (and the
glibc headers aren't providing x87 __asm__ statements) can be
verified that without -mfpmath we need to link -lm, but when
using it we don't.  Arguments to sin, cos, etc.. are passed in xmm0.
This is on RedHat 4.2 with glibc 2.3.4, so it's possible some
performance may be lost via a poor libm implementation, but
that's doubtful, and this OS configuration is "realistic".



My two constructive suggestions to Evandro and HJ to pass back to
their CPU designers are (i) that the choice to continue to produce
inaccurate results for transcendental functions is purely for
backwards compatability with old x87 codes that expects these errors.
Using an extra bit in an FPU control word, would allow the compiler
(for example, -ffast-math) to switch to perfectly rounded (and
perhaps faster) implementation in silicon rather than prolong the
defects of the original 8087 coprocessor.  Being able to evaluate
sin/cos faster in software than in hardware, normally indicates a
design flaw/opportunity.  (ii) The first CPU vendor to add ISA support
for rapidly moving values between the x87 registers and the SSE/SSE2
registers, or even a fast-path for the memory-write/memory-read should
enjoy a short-term performance advantage over their competitors.


I'll agree with Michael that vectorization and autovectorization,
and vector math libraries in gcc 4.3 and later, may well finally
decide the x87 vs. SSE debate.  But the numbers above show that for
atleast one (possibly unrealistic) benchmark, it'd be inappropriate
to disallow x87 math on x86_64 CPUs running on a 64-bit ABI OS.

Roger
--


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]