This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC PATCH, x86_64] Use -mno-sse[,2] to fall back to x87 FP argument passing convention


Hence whilst the dust hasn't yet settled, I think Uros/Jan's proposal
of function attributes to allow parameter passing in either x87 or SSE
registers seems like a good compromise.  The x86_64 API made the correct
decisions and choices when it was defined, and reflected the capabilities
of the compilers/processors available at the time.  Allowing the
flexibility to review those design choices as technologies evolve can
only be a good thing.  Like the IA-32's "LOOP" and "LEAVE" instructions,
their usage/benefit is dependent upon how microarchitectures evolve.
In recent years, technologies such as the pentium-4 and Itanium reveal
that progress is not always a pre-planned straight line.

Has anyone recently revisited the -mfpmath=* issue with SPEC or acovea?
I'm sure Uros can detail the current situation with POV-ray.

I have results for povray-3.6.1 on "Intel(R) Xeon(TM) CPU 3.60GHz", 32bit code:

-pipe -Wno-multichar -O3 -mfpmath={387,sse} -ffast-math
-D__NO_MATH_INLINES -march=pentium4 -mtune=pentium4 -malign-double

The results for _official_ povray.ini benchmark show nothing conclusive, with

28m11.082s for -mfpmath=sse and
28m24.763s for -mfpmath=387

Please note, that in this case, mfpmath=387 uses x87 intrinsics, and
SSE uses register-passing convention for local functions. I'll
benchmark Athlon XP soon.

I would like to expand the rationale for my proposed change a bit.
Having -mno-sse that would fall-back to 387, we would have following
choices for x86_64:

<no flags>: uses SSE register-passing
-mfpmath=387: uses SSE register-passing, but prefers x87 ops
-mno-sse: uses x87 (args on stack)
-mno-sse -mno-80387: emits error.

The problem is in -mfpmath=387, that always introduce a lot of
expensive SSE->mem->x87 moves (and vice versa). This is actually a
reverse situation of SSE on 32bits, where x87->mem->SSE is necessary
to perform SSE calculations.

The situation on ia32 was solved by introducing SSE register passing
for local functions (and it will be further improved by introducing
SSE regpassing ABI), but x86_64 simply refuses to pass FP arguments
anywhere, but SSE registers. On 32bits, user has the choice of
-mno-80387, but gcc won't error out, but will pass FP arguments via
integer registers.

IMO, the meaning of -mno-sse should mean that user does not want to
exercise SSE unit, but still wants to use 387 for FP calculations. The
later can still be disabled by -mno-80387, where 64bit compiler would
generate error (but should use integer registers, as in the case of
32bit compiler (?) ). The magnitude of -mno-sse changing ABI is thus
the same, as -mno-80387 is changing 32bit ABI. Perhaps we should warn
user in these cases, but IMO user is still the one that should have
ultimate control over the generated code.

Some time ago, I have posted a patch that implemented x87 register
passing convention. It showed a great speed-up for certain application
that I was using at that time, but the patch never got  to the point,
that it would be suitable for the mainline. IMO this patch could be
dusted-off, but with the introduction of strict SSE register passing
convention, it would be practically useless on 64bits.

OTOH, I see no problems having two competing FP subsystems. I'm sure
that for certain problems, one is better than the other. There are
some examples in Bugzilla, where x87 is way faster (for some problems,
as in PR 19780), so artifically limiting the compiler to SSE just
because "x87 should die"  is like shooting yourself in the foot. And a
bit of competition never hurts ;)

BTW: For my particular field of interest (processing real-world data),
there is no need for strict IEEE compatibility as data is always in
the same magnitude, and arguments to trigonometric functions are
always less than <something>^38 ;). Infinities would mean that
something went terribly wrong, so -ffast-math suits here perfectly. As
measured data has uncertainty of +-1%, ulps are not something to worry
about. However, 10% gain in speed (achievable with --ffast-math) means
that calculations would finish half a day earlier.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]