This is the mail archive of the
`gcc-patches@gcc.gnu.org`
mailing list for the GCC project.

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |

Other format: | [Raw text] |

*From*: Roger Sayle <roger at eyesopen dot com>*To*: Richard Guenther <richard dot guenther at gmail dot com>*Cc*: Uros Bizjak <uros at kss-loka dot si>, <gcc-patches at gcc dot gnu dot org>*Date*: Thu, 25 Nov 2004 10:38:51 -0700 (MST)*Subject*: Re: [BENCHMARK]-mfpmath=sse should disable x387 intrinsics

On Thu, 25 Nov 2004, Richard Guenther wrote: > I guess unrolling loops increases register pressure and as such makes > use of the extra FP registers. The testcase is again 50 iterations of > my famous tramp3d-v3.cpp. Do you have numbers with an without -funroll-loops to confirm that it is the loop unrolling that shifts the trade-off between SSE and x87? > To address the DFA issue, I present the numbers for -march=athlon64 > (and of course run on a Athlon64) - note this is without your patch and > -D__NO_MATH_INLINES is not only due to a very old libc from Debian woody: > > -mfpmath=sse -D__NO_MATH_INLINES: 55.3s > -mfpmath=sse,387 -D__NO_MATH_INLINES: 57.6s > -mfpmath=387 -D__NO_MATH_INLINES: 59.1s > -mfpmath=sse -fno-builtin-pow -fno-builtin-sqrt -D__NO_MATH_INLINES: 1m32s > -mfpmath=sse -fno-builtin-pow -fno-builtin-sqrt: 1m34.7s > > Other switches used are -ffast-math -funroll-loops -march=athlon64 > > The last two should be numbers equivalent to with your patch applied (pow > and sqrt are the only used math fns in my testcase), but maybe I'm confused > about the exact meaning of -fno-builtin-pow -fno-builtin-sqrt. I'll > build an updated mainline soon. My patch shouldn't be a drastic as -fno-builtin-pos -fno-builtin-sqrt. The middle-end will still recognize these as C99 math functions, and optimize their use understanding they are "const" functions with -ffast-math, i.e. CSE'ing calls with common arguments and rearrange simplify mathematical expressions pow(sqrt(x),y) -> pow(x,y*0.5) etc... Indeed, my patch should also preserve the use of the SSE "sqrtsd" instruction for implementing sqrt. extern double sqrt(double); double foo(double x) { return sqrt(x); } -O2 -ffast-math -march=pentium4 -mfpmath=sse -fomit-frame-pointer -S _foo: subl $12, %esp sqrtsd 16(%esp), %xmm0 movsd %xmm0, (%esp) fldl (%esp) addl $12, %esp ret > What I am most unhappy with is the changed semantics of -mfpmath=sse between > 3.4 and 4.0 then - wouldn't a -fno-builtin-XXX work, too? As explained above, -fno-builtin-XXX turns of the middle-end's recognition of libm math functions. -mfpmath=sse and historically -mno-fancy-math-387 should just disable the backend's use of the x87 intrinsics. And as hinted at in the original mail, cause it to use SSE intrinsics or libcalls instead. > Sorry, but you usually are not root at a supercomputing facility. But the system administrators are normally happy to provide MPI, PVM, BLAS and the other infrastructure necessary to get things done. Certainly, the kind folks at Los Alamos do for us :> > especially that I no longer can have the old fastest-for-me behavior. Agreed. Clearly fixing the tramp3d-v3 performance regression has become a personal priority and we may even be able to improve upon your previous best. Hopefully, someone will volunteer to help test the changes to the x86/*BSD backend that may be needed. Roger --

**Follow-Ups**:**Re: [BENCHMARK]-mfpmath=sse should disable x387 intrinsics***From:*Richard Guenther

**References**:**Re: [BENCHMARK]-mfpmath=sse should disable x387 intrinsics***From:*Richard Guenther

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |