This is the mail archive of the
mailing list for the GCC project.
Re: [RFC PATCH] implement fma() as builtin x87 or SSE intrinsic
Joseph S. Myers wrote:
Current situation in i386 arch is a little messy:
It seems fma simply does x * y + z. Wouldn't it be more sensible to
expand this in some machine-independent part and rely on the
machine-depentend part to reassemble it to something sensible? That
would save some code and it could also be picked up if the user really
wrote x * y + z.
Since the point of fma is to avoid intermediate rounding errors, expanding
it to a form that might allow such errors would be silly (and we don't
have the tree flags yet to say "this expression *must* be contracted").
Converting x * y + z to fma is OK depending on the state of the
FP_CONTRACT pragma (of which the default state is implementation-defined,
so if we implement the pragma it needn't default to on only with
1) hypot() function is always taken from mathinline.h, because it is
not #ifdef'd with __FAST_MATH__.
* The argument range of the inline version of hypotl is slightly
__inline_mathcodeNP2 (hypot, __x, __y, return __sqrtl (__x * __x + __y *
In libm (where could never be reached without -D__NO_MATH_INLINES ...),
it is defined as:
... input parameters checking ...
fmul %st(0) // y * y : x
fxch // x : y * y
fmul %st(0) // x * x : y * y
faddp // x * x + y * y
2) fma() with -ffast-math simply return x*y + z:
__inline_mathcodeNP3 (fma, __x, __y, __z, return (__x * __y) + __z)
And in libm, it does the same:
fldl 4(%esp) // x
fmull 12(%esp) // x * y
fldl 20(%esp) // z : x * y
faddp // (x * y) + z
With my patch, fma() is defined as gcc builtin function (for i386 arch
only), and with -ffast-math produces exactly the same code as fma() from
mathinline.h (and as libm function). fma() function should also be
defined as builtin function, if we want to define __NO_MATH_INLINES for
The problem with current hypot() implementation is, that for -ffast-math
-march=pentium4 -mfpmath=sse, it is compiled as:
-- cut here --
subl $12, %esp
movsd 16(%esp), %xmm1
movsd 24(%esp), %xmm0
mulsd %xmm0, %xmm0
mulsd %xmm1, %xmm1
addsd %xmm0, %xmm1
movsd %xmm1, (%esp)
addl $12, %esp
-- cut here --
Note, that if hypot() is implemented as builtin i386 function, sqrtsd
sse instruction would be generated in -ffast-math case, and generated
code would be without xmm->stack->fpstack moves.
Following this analysis, I still suggest for fma() and hypot() to be
implemented as builtin i387 function.