This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC PATCH] implement fma() as builtin x87 or SSE intrinsic


Joseph S. Myers wrote:

It seems fma simply does x * y + z. Wouldn't it be more sensible to
expand this in some machine-independent part and rely on the
machine-depentend part to reassemble it to something sensible? That
would save some code and it could also be picked up if the user really
wrote x * y + z.



Since the point of fma is to avoid intermediate rounding errors, expanding
it to a form that might allow such errors would be silly (and we don't
have the tree flags yet to say "this expression *must* be contracted"). Converting x * y + z to fma is OK depending on the state of the
FP_CONTRACT pragma (of which the default state is implementation-defined,
so if we implement the pragma it needn't default to on only with
-ffast-math).




Current situation in i386 arch is a little messy:

1) hypot() function is always taken from mathinline.h, because it is not #ifdef'd with __FAST_MATH__.

* The argument range of the inline version of hypotl is slightly reduced. */
__inline_mathcodeNP2 (hypot, __x, __y, return __sqrtl (__x * __x + __y * __y))


In libm (where could never be reached without -D__NO_MATH_INLINES ...), it is defined as:

       ... input parameters checking ...
       fmul %st(0)    // y * y : x
       fxch    // x : y * y
       fmul %st(0)    // x * x : y * y
       faddp         // x * x + y * y
       fsqrt
       ret

2) fma() with -ffast-math simply return x*y + z:

#ifdef __FAST_MATH__
__inline_mathcodeNP3 (fma, __x, __y, __z, return (__x * __y) + __z)
#endif

And in libm, it does the same:

ENTRY(__fma)
       fldl 4(%esp)   // x
       fmull  12(%esp) // x * y
       fldl 20(%esp) // z : x * y
       faddp         // (x * y) + z
       ret
END(__fma)

With my patch, fma() is defined as gcc builtin function (for i386 arch only), and with -ffast-math produces exactly the same code as fma() from mathinline.h (and as libm function). fma() function should also be defined as builtin function, if we want to define __NO_MATH_INLINES for i386 architecture.

The problem with current hypot() implementation is, that for -ffast-math -march=pentium4 -mfpmath=sse, it is compiled as:

-- cut here --
       subl $12, %esp
       movsd  16(%esp), %xmm1
       movsd  24(%esp), %xmm0
       mulsd  %xmm0, %xmm0
       mulsd  %xmm1, %xmm1
       addsd  %xmm0, %xmm1
       movsd  %xmm1, (%esp)
       fldl (%esp)
#APP
       fsqrt
#NO_APP
       addl $12, %esp
       ret
-- cut here --

Note, that if hypot() is implemented as builtin i386 function, sqrtsd sse instruction would be generated in -ffast-math case, and generated code would be without xmm->stack->fpstack moves.

Following this analysis, I still suggest for fma() and hypot() to be implemented as builtin i387 function.

Uros.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]