This is the mail archive of the
mailing list for the GCC project.
Re: [RFC PATCH, x86_64] Use -mno-sse[,2] to fall back to x87 FP argument passing convention
- From: Roger Sayle <roger at eyesopen dot com>
- To: Michael Matz <matz at suse dot de>, Uros Bizjak <ubizjak at gmail dot com>, Jan Hubicka <hubicka at ucw dot cz>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 9 Oct 2006 11:49:29 -0600 (MDT)
- Subject: Re: [RFC PATCH, x86_64] Use -mno-sse[,2] to fall back to x87 FP argument passing convention
On Mon, 9 Oct 2006, Michael Matz wrote:
> > In current state of x86_64 affairs, user can select -mfpmath=387 to
> > instruct the compiler to use x87 FP instructions.
> FWIW, I think -mfpmath=387 should actually give an error with -m64. I
> want all traces of x87 to die a silent death. For x86 that's impossible,
> we should not do anything to make it impossible for x86-64.
There was a time when I'd completely agree with you. Benchmarking our
numeric applications on x86_64 with and without -mfpmath=387 using
gcc-3.4 shows/showed SSE math to be a significant win. Unfortunately,
to complicate matters, the relative improvements in compiler handling
of SSE vs. x87 now confuses the issue. With mainline, some of those
same benchmarks have swung around, such that on several applications
-mfpmath=387 is now faster (sometimes significantly faster) on x86_64,
despite the benefits of native register passing conventions.
You should be able to reproduce these observations yourself which are
also bourne out by the publically available whetstone benchmark.
I now suspect that the x87 vs. SSE benchmarking wars will be with us
for a while longer. The huge gains of Paolo Bonzini's x87 register stall
patch [one of the most impressive performance improvements I've ever
seen in a GCC patch, most CPU designer's would give a limb for 20%
improvement in matrix multiply] and others from Uros et al., will probably
be offset by Richard Guenther's pending SSE intrinsic patches, and the
expected performance benefits of an SSE API/implement libmath in gcc 4.3.
Hence whilst the dust hasn't yet settled, I think Uros/Jan's proposal
of function attributes to allow parameter passing in either x87 or SSE
registers seems like a good compromise. The x86_64 API made the correct
decisions and choices when it was defined, and reflected the capabilities
of the compilers/processors available at the time. Allowing the
flexibility to review those design choices as technologies evolve can
only be a good thing. Like the IA-32's "LOOP" and "LEAVE" instructions,
their usage/benefit is dependent upon how microarchitectures evolve.
In recent years, technologies such as the pentium-4 and Itanium reveal
that progress is not always a pre-planned straight line.
Has anyone recently revisited the -mfpmath=* issue with SPEC or acovea?
I'm sure Uros can detail the current situation with POV-ray.
Sorry to stir up a recurrent argument, I'm just not convinced that
the outcome has been conclusively decided yet. In fact I suspect the
issue is less clear now than it was a year or two ago. I agree that
the x87 architecture is one large wart, but in scientific computing
we're just performance junkies who often ignore aesthetics, and just
follow the faster processors. As register allocation and other
optimizer technologies improve, our ability to improve performance on
cryptic ISAs (did I mention Itanium already) constantly increases.