[RFC PATCH] implement fma() as builtin x87 or SSE intrinsic

Uros Bizjak uros@kss-loka.si
Thu May 13 07:35:00 GMT 2004

Joseph S. Myers wrote:

>>It seems fma simply does x * y + z. Wouldn't it be more sensible to
>>expand this in some machine-independent part and rely on the
>>machine-depentend part to reassemble it to something sensible? That
>>would save some code and it could also be picked up if the user really
>>wrote x * y + z.
>Since the point of fma is to avoid intermediate rounding errors, expanding
>it to a form that might allow such errors would be silly (and we don't
>have the tree flags yet to say "this expression *must* be contracted").  
>Converting x * y + z to fma is OK depending on the state of the
>FP_CONTRACT pragma (of which the default state is implementation-defined,
>so if we implement the pragma it needn't default to on only with
Current situation in i386 arch is a little messy:

1)  hypot() function is always taken from mathinline.h, because it is 
not #ifdef'd with __FAST_MATH__.

* The argument range of the inline version of hypotl is slightly 
reduced.  */
__inline_mathcodeNP2 (hypot, __x, __y, return __sqrtl (__x * __x + __y * 

In libm (where could never be reached without  -D__NO_MATH_INLINES ...), 
it is defined as:

        ... input parameters checking ...
        fmul %st(0)    // y * y : x
        fxch    // x : y * y
        fmul %st(0)    // x * x : y * y
        faddp         // x * x + y * y

2)  fma() with -ffast-math simply return x*y + z:

#ifdef __FAST_MATH__
__inline_mathcodeNP3 (fma, __x, __y, __z, return (__x * __y) + __z)

And in libm, it does the same:

        fldl 4(%esp)   // x
        fmull  12(%esp) // x * y
        fldl 20(%esp) // z : x * y
        faddp         // (x * y) + z

With my patch, fma() is defined as gcc builtin function (for i386 arch 
only), and with -ffast-math produces exactly the same code as fma() from 
mathinline.h (and as libm function). fma() function should also be 
defined as builtin function, if we want to define __NO_MATH_INLINES for 
i386 architecture.

The problem with current hypot() implementation is, that for -ffast-math 
-march=pentium4 -mfpmath=sse, it is compiled as:

-- cut here --
        subl $12, %esp
        movsd  16(%esp), %xmm1
        movsd  24(%esp), %xmm0
        mulsd  %xmm0, %xmm0
        mulsd  %xmm1, %xmm1
        addsd  %xmm0, %xmm1
        movsd  %xmm1, (%esp)
        fldl (%esp)
        addl $12, %esp
-- cut here --

Note, that if hypot() is implemented as builtin i386 function,  sqrtsd 
sse instruction would be generated in -ffast-math case, and generated 
code would be without xmm->stack->fpstack moves.

Following this analysis, I still suggest for fma() and hypot() to be 
implemented as builtin i387 function.


More information about the Gcc-patches mailing list