[RFC PATCH] implement fma() as builtin x87 or SSE intrinsic
Uros Bizjak
uros@kss-loka.si
Thu May 13 07:35:00 GMT 2004
Joseph S. Myers wrote:
>>It seems fma simply does x * y + z. Wouldn't it be more sensible to
>>expand this in some machine-independent part and rely on the
>>machine-depentend part to reassemble it to something sensible? That
>>would save some code and it could also be picked up if the user really
>>wrote x * y + z.
>>
>>
>
>Since the point of fma is to avoid intermediate rounding errors, expanding
>it to a form that might allow such errors would be silly (and we don't
>have the tree flags yet to say "this expression *must* be contracted").
>Converting x * y + z to fma is OK depending on the state of the
>FP_CONTRACT pragma (of which the default state is implementation-defined,
>so if we implement the pragma it needn't default to on only with
>-ffast-math).
>
>
>
Current situation in i386 arch is a little messy:
1) hypot() function is always taken from mathinline.h, because it is
not #ifdef'd with __FAST_MATH__.
* The argument range of the inline version of hypotl is slightly
reduced. */
__inline_mathcodeNP2 (hypot, __x, __y, return __sqrtl (__x * __x + __y *
__y))
In libm (where could never be reached without -D__NO_MATH_INLINES ...),
it is defined as:
... input parameters checking ...
fmul %st(0) // y * y : x
fxch // x : y * y
fmul %st(0) // x * x : y * y
faddp // x * x + y * y
fsqrt
ret
2) fma() with -ffast-math simply return x*y + z:
#ifdef __FAST_MATH__
__inline_mathcodeNP3 (fma, __x, __y, __z, return (__x * __y) + __z)
#endif
And in libm, it does the same:
ENTRY(__fma)
fldl 4(%esp) // x
fmull 12(%esp) // x * y
fldl 20(%esp) // z : x * y
faddp // (x * y) + z
ret
END(__fma)
With my patch, fma() is defined as gcc builtin function (for i386 arch
only), and with -ffast-math produces exactly the same code as fma() from
mathinline.h (and as libm function). fma() function should also be
defined as builtin function, if we want to define __NO_MATH_INLINES for
i386 architecture.
The problem with current hypot() implementation is, that for -ffast-math
-march=pentium4 -mfpmath=sse, it is compiled as:
-- cut here --
subl $12, %esp
movsd 16(%esp), %xmm1
movsd 24(%esp), %xmm0
mulsd %xmm0, %xmm0
mulsd %xmm1, %xmm1
addsd %xmm0, %xmm1
movsd %xmm1, (%esp)
fldl (%esp)
#APP
fsqrt
#NO_APP
addl $12, %esp
ret
-- cut here --
Note, that if hypot() is implemented as builtin i386 function, sqrtsd
sse instruction would be generated in -ffast-math case, and generated
code would be without xmm->stack->fpstack moves.
Following this analysis, I still suggest for fma() and hypot() to be
implemented as builtin i387 function.
Uros.
More information about the Gcc-patches
mailing list