This is the mail archive of the
mailing list for the GCC project.
Re: [BENCHMARK]-mfpmath=sse should disable x387 intrinsics
- From: Roger Sayle <roger at eyesopen dot com>
- To: Richard Guenther <richard dot guenther at gmail dot com>
- Cc: Uros Bizjak <uros at kss-loka dot si>, <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 25 Nov 2004 10:38:51 -0700 (MST)
- Subject: Re: [BENCHMARK]-mfpmath=sse should disable x387 intrinsics
On Thu, 25 Nov 2004, Richard Guenther wrote:
> I guess unrolling loops increases register pressure and as such makes
> use of the extra FP registers. The testcase is again 50 iterations of
> my famous tramp3d-v3.cpp.
Do you have numbers with an without -funroll-loops to confirm that
it is the loop unrolling that shifts the trade-off between SSE and x87?
> To address the DFA issue, I present the numbers for -march=athlon64
> (and of course run on a Athlon64) - note this is without your patch and
> -D__NO_MATH_INLINES is not only due to a very old libc from Debian woody:
> -mfpmath=sse -D__NO_MATH_INLINES: 55.3s
> -mfpmath=sse,387 -D__NO_MATH_INLINES: 57.6s
> -mfpmath=387 -D__NO_MATH_INLINES: 59.1s
> -mfpmath=sse -fno-builtin-pow -fno-builtin-sqrt -D__NO_MATH_INLINES: 1m32s
> -mfpmath=sse -fno-builtin-pow -fno-builtin-sqrt: 1m34.7s
> Other switches used are -ffast-math -funroll-loops -march=athlon64
> The last two should be numbers equivalent to with your patch applied (pow
> and sqrt are the only used math fns in my testcase), but maybe I'm confused
> about the exact meaning of -fno-builtin-pow -fno-builtin-sqrt. I'll
> build an updated mainline soon.
My patch shouldn't be a drastic as -fno-builtin-pos -fno-builtin-sqrt.
The middle-end will still recognize these as C99 math functions, and
optimize their use understanding they are "const" functions with
-ffast-math, i.e. CSE'ing calls with common arguments and rearrange
simplify mathematical expressions pow(sqrt(x),y) -> pow(x,y*0.5) etc...
Indeed, my patch should also preserve the use of the SSE "sqrtsd"
instruction for implementing sqrt.
extern double sqrt(double);
double foo(double x)
-O2 -ffast-math -march=pentium4 -mfpmath=sse -fomit-frame-pointer -S
subl $12, %esp
sqrtsd 16(%esp), %xmm0
movsd %xmm0, (%esp)
addl $12, %esp
> What I am most unhappy with is the changed semantics of -mfpmath=sse between
> 3.4 and 4.0 then - wouldn't a -fno-builtin-XXX work, too?
As explained above, -fno-builtin-XXX turns of the middle-end's recognition
of libm math functions. -mfpmath=sse and historically -mno-fancy-math-387
should just disable the backend's use of the x87 intrinsics. And as
hinted at in the original mail, cause it to use SSE intrinsics or libcalls
> Sorry, but you usually are not root at a supercomputing facility.
But the system administrators are normally happy to provide MPI,
PVM, BLAS and the other infrastructure necessary to get things done.
Certainly, the kind folks at Los Alamos do for us :>
> especially that I no longer can have the old fastest-for-me behavior.
Agreed. Clearly fixing the tramp3d-v3 performance regression has
become a personal priority and we may even be able to improve upon
your previous best. Hopefully, someone will volunteer to help test
the changes to the x86/*BSD backend that may be needed.